Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-10 Thread Rowan Worth
On Tue, 5 Feb 2019 at 22:46, Simon Slavin  wrote:

> On 5 Feb 2019, at 8:59am, Rowan Worth  wrote:
>
> > SELECT source1, source2, ts, value
> > FROM rolling
> > WHERE source1 = 'aaa'
> >  AND ts > 1 AND ts < 1
> > ORDER BY source1, source2, ts;
> >
> > And this index:
> >
> > CREATE INDEX `sources` ON `rolling` (
> >`source1`,
> >`source2`,
> >`ts`
> > );
> >
> > What is stopping sqlite's query planner from taking advantage of the
> index, which it has chosen to use for the query, to also satisfy the ORDER
> BY?
>
> I suspect that, given the data in the table, the index supplied is not
> optimal for selecting the correct rows from the table.  SQLite may have
> decided that it needs to select on the contents of ts first, then source1.
>

This seems like a reasonable hypothesis, and explains one of Gerlando's
observations (sqlite _did_ decide to use an index on `ts` in a different
version of the DB). However, the EXPLAIN QUERY PLAN output demonstrates
that it _is_ using the `sources` index when that's the only one available:

QUERY PLAN
|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2)
AND ts>? AND tshttp://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-06 Thread R Smith



On 2019/02/06 12:12 AM, Gerlando Falauto wrote:


The use case involves retaining as much data as the storage can possibly
hold (so a bunch of gigabytes).
I could've just used directories and logfiles instead of abusing a
relational database but I just thought it would more convenient to issue a
query and use a cursor.


There is nothing wrong with using a database to store a list of values, 
as another poster pointed out, [insert high made-up number]% of schemata 
out there are basically just that, but I like Keith's suggestion, since 
you did decide to DB it, why not make it nicely relational too?



the table is often more efficient than threading a lookup via another
index into the query plan. Sometimes crafting a new temp BTree Index for
(a) specific field(s) on a materialized set of data might also be judged
faster than re-establishing links between said data and its original Index.


Do you think restoring the original primary key (instead of ROWID) and
dropping the index would make any difference?


I do think it would make a difference (almost any change would), but I 
am not sure it would make all the difference. I would however suggest, 
at the very least, to test this and see.






I pre-populated the table with a realistic use case scenario and ran
ANALYZE.
I'm not planning on using ANALYZE on the real system -- though I might
indeed pre-populate sqlite_stat1 with typical values as suggested in the
docs.


This is fine. I would ask - did your "test" data include a gigabyte or 
more data? The amount, cardinality and shape of the data are all most 
important for ANALYZE to provide good information.



If you can demonstrate a true degradation //...

Yes, in the worst case, adding the ORDER BY clause (2 vs.1, 4 vs.3) leads
to a perceivable degradation in terms of both seek time (several seconds
vs. milliseconds to get the first row) and occupied disk space.


I must have missed this, apologies, that is certainly a very true 
degradation. Note that the entire query delivery should be taken into 
consideration. The first row can often be delivered near instantaneous 
with following rows taking progressively longer. Very often the time it 
takes to deliver the first row is compensated by the time that is saved 
later along the subsequent rows. The QP takes this into consideration.


A good example is getting a set of data, say 100 rows, sorting it first 
and then just spitting it out from memory. The preparation (aka first 
row delivery) will take time, all the rest will be instant. Contrast 
that with a query that needs no sorting, it might produce rows as it 
scans the table, the first of which might appear instantly (since it's 
at the top of the table and satisfies the WHERE clause), but all the 
next qualifying rows might take a long while to produce depending on 
where they fall within the table. In the end the fully traversed cursor 
may take similar amounts of time.


The QP cannot know before-hand how many "hits" it would encounter, so 
has to use a basic pre-made guide and/or help from the ANALYZE data to 
best guess which route is better - and you can easily construct a 
non-usual set of data for which it will choose wrong every time, and for 
which "fixing" it will negatively affect more common sets of data.




As I already said, my use case *is* quite unusual. Definitely not something
you'd normally use a relational database for.


That does not matter - if the query planner can do better, it should - 
unless of course changing the decision tree will negatively affect 
another more widely used query case. (This is the hard part to establish.)



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-05 Thread D Burgess
On Wed, Feb 6, 2019 at 11:26 AM Keith Medcalf  wrote:
"you have not normalized the data before storing it"

This is true of most of the hundreds, if not thousands, of schema that I
have seen.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-05 Thread Keith Medcalf

On Tuesday, 5 February, 2019 15:12, Gerlando Falauto wrote:

> I could've just used directories and logfiles instead of abusing
> a relational database but I just thought it would more convenient
> to issue a query and use a cursor.

Well, the "abusing a relational database" is the correct terminology because 
you have not normalized the data before storing it.  Failure to normalize 
causes all sorts of problems when you attempt to "retrieve" and "store" data -- 
if the data is not normalized you should not really expect anything more 
performant than would be obtained by simply sequentially scanning sequential 
log files.  Also, relational databases do not have cursors (cursors are an 
illusion).  Relational databases operate on "sets" of data all at one get-go.  
The appearance of "sequential execution row by row" is merely an artifice of 
the current limits of computer technology, and what some refer to a "cursors" 
are merely implementation details or "magic programming" allowing access to a 
"set" in a row by row fashion.

You probably want separate tables for "source1" and for "source2" because these 
are not keys of the events, they are attributes of the events, and contain 
duplicate data.  Similarly the timestamp is an attribute of an event, not a key 
of an event.  The only true key of the event is a pseudo-key conveying the 
order in which the events happened and nothing more (for which a simple record 
number or "integer primary key") will suffice.

Now you have to decide whether or not "source2" is dependant on "source1" or 
not.  This is rather simple.  If the same "source2" can occur in combination 
with several "source1" then it is independant.  (This also depends on what you 
want to do with the data after you have it.)

So now you have the following minimally normalized schema:

create table Source1
(
  id integer primary key,
  name text collate nocase unique
);
create table Source2
(
  id integer primary key,
  name text collate nocase unique
);
create table Events
(
  id integer primary key,
  Source1id integer not null references Source1,
  Source2id integer not null references Source2,
  ts real not null,
  --- other data items ---
);
create index EventSource1 on Events (Source1id, Source2id);
create index EventSource2 on Events (Source2id, Source1id);
create index EventTS on Events (ts);

So now if you want all the combinations of source1 and source2 it is simply:

  select source1.name, source2.name 
from source1, source2
order by 1, 2;

If you only want them where there are events then:

  select source1.name, source2.name 
from source1, source2
   where exists (select * 
   from events 
  where events.source1id == source1.id 
and events.source2id == source2.id)
order by 1, 2;

If you want all the stuff between two ts you can do:

select source1.name, source2.name, events.*
  from source1, source2, events
 where source1.id == events.source1id
   and source2.id == events.source2id
   and events.id between coalesce((select max(id)
 from events
where ts <= :StartTime), 
  (select min(id)
 from events))
 and coalesce((select min(id)
 from events
where ts >= :StartTime), 
  (select max(id)
 from events))
order by events.id;

If you only want the data for source2 = 'GATE' you can do the same and add:

   and source2.name == 'GATE'

or probably more efficient:

   and events.source2id == (select id from source2 where name == 'GATE')

and even other combinations:

   and events.source1id in (select id from source1 where name in ('ABA', 'AXX'))
   and events.source2id in (select id from source2 where name == 'GATE')

and so on and so forth, and let the query optimizer decide the best way to 
actually retrieve the data you requested.


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-05 Thread Simon Slavin
On 5 Feb 2019, at 10:12pm, Gerlando Falauto  wrote:

> I actually started off with source1,source2,ts as the primary key and for 
> some reason (which I no longer remember) I thought it would be wise to use a 
> ROWID and add an index instead.

That is probably the right solution.  There are reasons to keep the primary key 
very short, and ROWID is about as short as a SQLite value can be.

> [...]
> 
> I pre-populated the table with a realistic use case scenario and ran ANALYZE.
> I'm not planning on using ANALYZE on the real system -- though I might indeed 
> pre-populate sqlite_stat1 with typical values as suggested in the docs.

Having run ANALYZE on your 'typical' data, you can copy the resulting 
sqlite_stat1 table onto production systems, even though the tables it refers to 
are empty.  ANALYZE not only analyzes the contents of your tables but also your 
indexes.  If you add or drop indexes, it's useful to run ANALYZE again.

ANALYZE is the number 1 fastest easiest tool to provide SQLite with the 
information it needs to pick the best query plan.  As a human you can't 
possibly hope to match it.  If your system has a yearly maintenance procedure, 
it's good to include ANALYZE.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-05 Thread Gerlando Falauto
Hi Ryan,

first of all thank you for your patience and contribution.

[]

>
> Add to that the fact that an SQLite TABLE is, in and of itself, nothing
> less than a covering Index with row_id as a key (or a custom key for
> WITHOUT ROWID tables), and as such it is a rather good Index and a
> mostly preferred Index by the query planner (because "using" any other
> index adds cycles plus an extra row_id lookup).


That's quite interesting indeed.
I actually started off with source1,source2,ts as the primary key and for
some reason (which I no longer remember) I thought it would be wise to use
a ROWID and add an index instead.
Please bear in mind this table is nothing more than a continuous log with a
two-columned source identifier and a timestamp.
The use case involves retaining as much data as the storage can possibly
hold (so a bunch of gigabytes).
I could've just used directories and logfiles instead of abusing a
relational database but I just thought it would more convenient to issue a
query and use a cursor.

Due to this, scanning
> the table is often more efficient than threading a lookup via another
> index into the query plan. Sometimes crafting a new temp BTree Index for
> (a) specific field(s) on a materialized set of data might also be judged
> faster than re-establishing links between said data and its original Index.
>

Do you think restoring the original primary key (instead of ROWID) and
dropping the index would make any difference?

>
> The method by which the query planner decides which other Index (if any)
> should be used involves a bit of game theory, typically looking at some
> ANALYZE result data along with with some tried and tested weights in the
> decision tree (which I'm not going into since A - It's not important,
> and B - I don't know enough of how SQLite does it). If the end score
> finds that there is no remarkable advantage to using a separate index,
> then it WILL opt to use the more-efficient table scan.
>

I pre-populated the table with a realistic use case scenario and ran
ANALYZE.
I'm not planning on using ANALYZE on the real system -- though I might
indeed pre-populate sqlite_stat1 with typical values as suggested in the
docs.

It might be that the adding of the "ORDER BY" simply pushes one such
> decision weight over the edge in this use case, and, once the table data
> evolved to be more complex or hefty, it may again turn to the Index.
>

> To add to another poster's comment: Do not second-guess the
> Query-planner, leave it to its devices. You may even be able to
> construct a scenario where the specific use case causes the QP to choose
> an execution path that is slightly slower than an alternate one, but if
> it is looked at in the general case, then other similar query scenarios
> might again be faster with that chosen path. Further to this, if you
> construct a weird query now to force a path of execution with some gain,
> you possibly prohibit it from capitalizing on an even better improvement
> that might be inherent to the next SQLite update (possibly thanks to
> your very own report here).
>
>
I could not agree more.



> If you can demonstrate a true degradation (one that slows down a
> significant time slice that trespasses on human-perceptible time) for a
> general query, an optimization will surely be considered, but this case,
> unless I've misunderstood the severity, does not seem to warrant that.
>

Yes, in the worst case, adding the ORDER BY clause (2 vs.1, 4 vs.3) leads
to a perceivable degradation in terms of both seek time (several seconds
vs. milliseconds to get the first row) and occupied disk space.

Keith's approach (5) seems to somehow mitigate the effect by only adding
some +33% execution time (40s vs. 30s) at the cost of a "weird" query.
A query which I would never have thought of by myself, and whose query plan
I don't quite understand.


> [PS: this is not a discouragement, it's great to hear of every possible
> quirk and make other users aware of a possible query scenario that might
> not be optimal - thanks for that, and I'm certain the devs would notice
> this, perhaps even get on fixing it right away, or maybe only keep it in
> the back of their minds for when the next round of query-planner
> refinement happens. I'm simply saying that there is possibly no
> satisfying answer to your question right now - best we can do is:
> "Sometimes the QP correctly evaluates the best path to be one that is
> not obviously best to us, or maybe even worse for a specific case, but
> typically better in the general case".]
>

As I already said, my use case *is* quite unusual. Definitely not something
you'd normally use a relational database for.
So I'm not surprised it's not the use case SQLite is optimized against.

Thank you everyone for contributing to this conversation!
Gerlando
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org

Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-05 Thread R Smith



On 2019/02/05 4:46 PM, Simon Slavin wrote:

On 5 Feb 2019, at 8:59am, Rowan Worth  wrote:



What is stopping sqlite's query planner from taking advantage of the index, 
which it has chosen to use for the query, to also satisfy the ORDER BY?

I suspect that, given the data in the table, the index supplied is not optimal 
for selecting the correct rows from the table.  SQLite may have decided that it 
needs to select on the contents of ts first, then source1.


And to add to this:

An Index is nothing magical and not a save-the-World-from-every-monster 
type of device (as newer DB programmers often think). It's an expensive 
add-on that provides an ordered binary lookup which, given enough bulk, 
will eventually win the efficiency race over the extra computation it 
adds. (The more bulk, the more win).
(Some DB programmers, when they see the words "table scan" in any Query 
plan, immediately feel as if they have somehow failed to correctly 
optimize the query. This is silly - a table scan is often the most 
optimal solution).


Add to that the fact that an SQLite TABLE is, in and of itself, nothing 
less than a covering Index with row_id as a key (or a custom key for 
WITHOUT ROWID tables), and as such it is a rather good Index and a 
mostly preferred Index by the query planner (because "using" any other 
index adds cycles plus an extra row_id lookup). Due to this, scanning 
the table is often more efficient than threading a lookup via another 
index into the query plan. Sometimes crafting a new temp BTree Index for 
(a) specific field(s) on a materialized set of data might also be judged 
faster than re-establishing links between said data and its original Index.


The method by which the query planner decides which other Index (if any) 
should be used involves a bit of game theory, typically looking at some 
ANALYZE result data along with with some tried and tested weights in the 
decision tree (which I'm not going into since A - It's not important, 
and B - I don't know enough of how SQLite does it). If the end score 
finds that there is no remarkable advantage to using a separate index, 
then it WILL opt to use the more-efficient table scan.


It might be that the adding of the "ORDER BY" simply pushes one such 
decision weight over the edge in this use case, and, once the table data 
evolved to be more complex or hefty, it may again turn to the Index.


To add to another poster's comment: Do not second-guess the 
Query-planner, leave it to its devices. You may even be able to 
construct a scenario where the specific use case causes the QP to choose 
an execution path that is slightly slower than an alternate one, but if 
it is looked at in the general case, then other similar query scenarios 
might again be faster with that chosen path. Further to this, if you 
construct a weird query now to force a path of execution with some gain, 
you possibly prohibit it from capitalizing on an even better improvement 
that might be inherent to the next SQLite update (possibly thanks to 
your very own report here).


If you can demonstrate a true degradation (one that slows down a 
significant time slice that trespasses on human-perceptible time) for a 
general query, an optimization will surely be considered, but this case, 
unless I've misunderstood the severity, does not seem to warrant that.


[PS: this is not a discouragement, it's great to hear of every possible 
quirk and make other users aware of a possible query scenario that might 
not be optimal - thanks for that, and I'm certain the devs would notice 
this, perhaps even get on fixing it right away, or maybe only keep it in 
the back of their minds for when the next round of query-planner 
refinement happens. I'm simply saying that there is possibly no 
satisfying answer to your question right now - best we can do is: 
"Sometimes the QP correctly evaluates the best path to be one that is 
not obviously best to us, or maybe even worse for a specific case, but 
typically better in the general case".]



Cheers,
Ryan


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-05 Thread Simon Slavin
On 5 Feb 2019, at 8:59am, Rowan Worth  wrote:

> SELECT source1, source2, ts, value
> FROM rolling
> WHERE source1 = 'aaa'
>  AND ts > 1 AND ts < 1
> ORDER BY source1, source2, ts;
> 
> And this index:
> 
> CREATE INDEX `sources` ON `rolling` (
>`source1`,
>`source2`,
>`ts`
> );
> 
> What is stopping sqlite's query planner from taking advantage of the index, 
> which it has chosen to use for the query, to also satisfy the ORDER BY?

I suspect that, given the data in the table, the index supplied is not optimal 
for selecting the correct rows from the table.  SQLite may have decided that it 
needs to select on the contents of ts first, then source1.

1) Put some typical data in the table.  The table should have an appropriate 
number of rows, and appropriate values in each field.
2) Add an index for (ts, source1, source2).
3) Add an index for (source1, ts, source2, value).
4) Add an index for (ts, source1, source2, value).
5) Run ANALYZE
6) Use EXPLAIN QUERY PLAN on the above SELECT to find which index SQLite chose.

You now know what SQLite thinks is the best index for the query.  You can 
delete the other indexes and run VACUUM.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-05 Thread Gerlando Falauto
Hi Rowan,

thank you for your kind support. You grasped the essence of my questions.
:-)
I'm using SQLite 3.25.00.

Thank you,
Gerlando




On Tue, Feb 5, 2019 at 9:59 AM Rowan Worth  wrote:

> On Tue, 5 Feb 2019 at 16:06, Simon Slavin  wrote:
>
> > On 5 Feb 2019, at 8:00am, Gerlando Falauto 
> > wrote:
> >
> > > Thank you for your explanations guys. All this makes perfect sense.
> > > I still can't find a solution to my problem though -- write a query
> that
> > is guaranteed to return sorted results, in some optimal way.
> >
> > Please state your table definition, and desired query including ORDER BY
> > clause.  Please also tell us whether the amount of space taken up by your
> > database file is important.  Then we will tell you how to make SQLite use
> > an efficient way to arrive at your desired result.
> >
>
> The table definition was literally the first thing in Gerlando's initial
> email, and the desired query has also been clarified. But I assume you
> didn't actually read the thread before commenting; if you had you would
> have also noticed that Gerlando was the first person to note that it isn't
> reliable to depend on the order of results coming out of a SELECT which
> doesn't have an ORDER BY clause.
>
> IMO it would be great if we could all move on from that well established
> fact and focus on the issue Gerlando is trying to raise. We have this
> query:
>
> SELECT source1, source2, ts, value
> FROM rolling
> WHERE source1 = 'aaa'
>   AND ts > 1 AND ts < 1
> ORDER BY source1, source2, ts;
>
> And this index:
>
> CREATE INDEX `sources` ON `rolling` (
> `source1`,
> `source2`,
> `ts`
> );
>
> What is stopping sqlite's query planner from taking advantage of the index,
> which it has chosen to use for the query, to also satisfy the ORDER BY?
> Instead adds an extra TEMP B-TREE step to sort the results, which slows
> things down. Intuitively it seems there's a potential for optimisation
> here. Which doesn't mean it's feasible, but it would be a pretty good win
> to be able to provide ORDER BY for free in more circumstances so it's worth
> considering.
>
> Gerlando, what version of sqlite are you using?
>
> -Rowan
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-05 Thread Rowan Worth
On Tue, 5 Feb 2019 at 16:06, Simon Slavin  wrote:

> On 5 Feb 2019, at 8:00am, Gerlando Falauto 
> wrote:
>
> > Thank you for your explanations guys. All this makes perfect sense.
> > I still can't find a solution to my problem though -- write a query that
> is guaranteed to return sorted results, in some optimal way.
>
> Please state your table definition, and desired query including ORDER BY
> clause.  Please also tell us whether the amount of space taken up by your
> database file is important.  Then we will tell you how to make SQLite use
> an efficient way to arrive at your desired result.
>

The table definition was literally the first thing in Gerlando's initial
email, and the desired query has also been clarified. But I assume you
didn't actually read the thread before commenting; if you had you would
have also noticed that Gerlando was the first person to note that it isn't
reliable to depend on the order of results coming out of a SELECT which
doesn't have an ORDER BY clause.

IMO it would be great if we could all move on from that well established
fact and focus on the issue Gerlando is trying to raise. We have this query:

SELECT source1, source2, ts, value
FROM rolling
WHERE source1 = 'aaa'
  AND ts > 1 AND ts < 1
ORDER BY source1, source2, ts;

And this index:

CREATE INDEX `sources` ON `rolling` (
`source1`,
`source2`,
`ts`
);

What is stopping sqlite's query planner from taking advantage of the index,
which it has chosen to use for the query, to also satisfy the ORDER BY?
Instead adds an extra TEMP B-TREE step to sort the results, which slows
things down. Intuitively it seems there's a potential for optimisation
here. Which doesn't mean it's feasible, but it would be a pretty good win
to be able to provide ORDER BY for free in more circumstances so it's worth
considering.

Gerlando, what version of sqlite are you using?

-Rowan
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-05 Thread Simon Slavin
On 5 Feb 2019, at 8:00am, Gerlando Falauto  wrote:

> Thank you for your explanations guys. All this makes perfect sense.
> I still can't find a solution to my problem though -- write a query that is 
> guaranteed to return sorted results, in some optimal way.

Please state your table definition, and desired query including ORDER BY 
clause.  Please also tell us whether the amount of space taken up by your 
database file is important.  Then we will tell you how to make SQLite use an 
efficient way to arrive at your desired result.

It's not possible for us to just tell you the answer.  This will involve a few 
steps on your part of testing.  So if you aren't already using the SQLite 
command-line tool, please install it and learn to run it.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-05 Thread Gerlando Falauto
Thank you for your explanations guys. All this makes perfect sense.
I still can't find a solution to my problem though -- write a query that is
guaranteed to return sorted results, in some optimal way.

Any suggestion welcome.

Thank you,
Gerlando

Il lun 4 feb 2019, 22:24 Simon Slavin  ha scritto:

> On 4 Feb 2019, at 9:14pm, James K. Lowden 
> wrote:
>
> > As Keith said, SQLite allows ORDER BY in subqueries.  The SQL standard
> does not.
>
> True.  But SQLite does not guarantee that the outer query will preserve
> the inner query's ORDER BY, even if the outer query doesn't have its own
> ORDER BY.  So be careful.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-04 Thread Simon Slavin
On 4 Feb 2019, at 9:14pm, James K. Lowden  wrote:

> As Keith said, SQLite allows ORDER BY in subqueries.  The SQL standard does 
> not.  

True.  But SQLite does not guarantee that the outer query will preserve the 
inner query's ORDER BY, even if the outer query doesn't have its own ORDER BY.  
So be careful.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-04 Thread James K. Lowden
On Mon, 4 Feb 2019 18:55:33 +0100
Gerlando Falauto  wrote:

> I remember reading ORDER BY is only allowed in
> the outer query

As Keith said, SQLite allows ORDER BY in subqueries.  The SQL standard
does not.  

Logically, ORDER BY makes sense only for the outer query.  An SQL
SELECT statement decribes a set of results, possibly presented in a
particular order. An "internal ORDER BY" describes neither selection
criteria nor presentation order.  

Technically, ORDER BY takes a tabular result as input and produces a
cursor as output.  It need not sort the results; all that's required is
that the cursor return successive rows in the prescribed order.  

SQLite extends ORDER BY with LIMIT.  Because the combination affects
more than just the order, it can be useful to use ORDER BY in a
subquery.  Now that window functions provide a (more convenient)
standard way to produce row numbers, LIMIT is a bit of anachronism but, 
for reasons of backwards compatibility, is unlikely to be removed.  

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-04 Thread Keith Medcalf

>I wonder if I'd be allowed to add an ORDER BY in the subquery and if
>that would make any difference -- I remember reading ORDER BY is only
>allowed in the outer query (which makes perfect sense).

Yes, you can use an order by in a subquery (either a correlated subquery or a 
table generating subquery).  And you can use one in the outer query which will 
be used to order the results.  Whether the order-by clause in a non-correlated 
subquery has any meaning depends on how the query planner decides to perform 
the query.  The order-by in a table generating subquery (that is not 
correlated) may be "pushed" to the outer query just as a where clause in the 
subquery may also get pushed to the outer query as well if the query is 
flattened or if doing so results in a more optimum plan, or so I have observed 
...

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-04 Thread Gerlando Falauto
On Mon, Feb 4, 2019 at 4:52 PM Simon Slavin  wrote:

> On 4 Feb 2019, at 1:55pm, Gerlando Falauto 
> wrote:
>
> > Or (most likely) my understanding of how data is retrieved is plain
> wrong...
>
> Or your understanding how the current version of SQLite is correct, but a
> later version of SQLite will have different optimizations and do things
> differently.  So at some point you'll update your libraries and suddenly
> things won't work any more.  So if you depend on sorting, future-proof your
> code by asking for sorting.
>

As I just wrote in replty to Luuk's comment, I'm not questioning the usage
of ORDER BY. It should be there. I just would expect it to add ZERO
overhead.
But again, perhaps I'm making the wrong assumption that using the index
data would be "already sorted", perhaps it isn't.
I wonder if I'd be allowed to add an ORDER BY in the subquery and if that
would make any difference -- I remember reading ORDER BY is only allowed in
the outer query (which makes perfect sense).


> By the way, here's an example of a SQL engine (not SQLite) not using an
> index when you though it would.  Suppose you have a short table …just 40
> rows:
>
> CREATE TABLE MyTable (a INTEGER, b TEXT);
> CREATE UNIQUE INDEX MT_a ON MyTable (a);
> INSERT <40 rows of data into MyTable>
>
> SELECT a,b FROM MyTable ORDER BY a;
>
> The assumed plan would be to use the index to retrieve the row order, then
> to look up each retrieved row in the table to retrieve the value for b.
> This requires one index walk plus 40 table lookups.
>
> But the engine knows that 40 table lookups takes a long time.  It would be
> faster to read the table, then sort it internally.  It's a table with only
> 40 rows, so sorting it would be fast and take only a little memory.  That
> saves 40 lookups.
>
> So even though there's an index, it's not a covering index (it doesn't
> contain all the data needed) so it won't be used.
>

That's understandable and I was expecting that. That's why I populated the
test dataset with *muuuch* more data -- to avoid corner cases like this.

Thanks again for your patience ;-)
Gerlando
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-04 Thread Gerlando Falauto
Hi Luuk,

It says:
>
> SQLite *attempts* to use an index to satisfy the ORDER BY clause of a
> query when possible
>
>
> To be (abolutely!) SURE results are in the correct order, you need an
> ORDER BY.
>

No questioning about that. ORDER BY *must* be there in order to get the
results correctly sorted.

What I'm saying is it shouldn't add any extra overhead, which apparently
does in my case.

Notice how ORDER BY does not add any in all the trivial cases:

sqlite> explain query plan select * from rolling order by
source1,source2,ts;
QUERY PLAN
`--SCAN TABLE rolling USING INDEX sources
sqlite> explain query plan select * from rolling order by
source1,source2;
QUERY PLAN
`--SCAN TABLE rolling USING INDEX sources
sqlite> explain query plan select * from rolling order by source1;
QUERY PLAN
`--SCAN TABLE rolling USING INDEX sources
sqlite> explain query plan select source1,source2,ts from rolling order by
source1,source2,ts;
QUERY PLAN
`--SCAN TABLE rolling USING COVERING INDEX sources
sqlite> explain query plan select source1,source2,ts from rolling order by
source1,source2;
QUERY PLAN
`--SCAN TABLE rolling USING COVERING INDEX sources
sqlite> explain query plan select source1,source2,ts from rolling order by
source1;
QUERY PLAN
`--SCAN TABLE rolling USING COVERING INDEX sources

It's just that when things get a bit more complicated, I start getting
surprising results (for my use case -- which, I admit, is unusual at least).

Any suggestion welcome.
Thanks,
Gerlando
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-04 Thread Simon Slavin
On 4 Feb 2019, at 1:55pm, Gerlando Falauto  wrote:

> Or (most likely) my understanding of how data is retrieved is plain wrong...

Or your understanding how the current version of SQLite is correct, but a later 
version of SQLite will have different optimizations and do things differently.  
So at some point you'll update your libraries and suddenly things won't work 
any more.  So if you depend on sorting, future-proof your code by asking for 
sorting.

By the way, here's an example of a SQL engine (not SQLite) not using an index 
when you though it would.  Suppose you have a short table …just 40 rows:

CREATE TABLE MyTable (a INTEGER, b TEXT);
CREATE UNIQUE INDEX MT_a ON MyTable (a);
INSERT <40 rows of data into MyTable>

SELECT a,b FROM MyTable ORDER BY a;

The assumed plan would be to use the index to retrieve the row order, then to 
look up each retrieved row in the table to retrieve the value for b.  This 
requires one index walk plus 40 table lookups.

But the engine knows that 40 table lookups takes a long time.  It would be 
faster to read the table, then sort it internally.  It's a table with only 40 
rows, so sorting it would be fast and take only a little memory.  That saves 40 
lookups.

So even though there's an index, it's not a covering index (it doesn't contain 
all the data needed) so it won't be used.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-04 Thread Luuk


On 4-2-2019 14:55, Gerlando Falauto wrote:

Thank you Luuk, I understand your point.
However, the query plan already takes advantage of the index and should be
retrieving data in that order.
Reading the docs
https://www.sqlite.org/optoverview.html#order_by_optimizations my
understanding was that
SQLite would be taking advantage of that.
So perhaps my use case it's too complicated (many columns -- some filtered,
some not -- skip/scan, all together) to make it obvious to the query
planner that data *is* already sorted.
Or maybe it never occurred to anyone that someone might be trying to do
something like that.
Or (most likely) my understanding of how data is retrieved is plain wrong...

Thank you!
Gerlando



It says:

SQLite *attempts* to use an index to satisfy the ORDER BY clause of a 
query when possible



To be (abolutely!) SURE results are in the correct order, you need an 
ORDER BY.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-04 Thread Gerlando Falauto
Thank you Luuk, I understand your point.
However, the query plan already takes advantage of the index and should be
retrieving data in that order.
Reading the docs
https://www.sqlite.org/optoverview.html#order_by_optimizations my
understanding was that
SQLite would be taking advantage of that.
So perhaps my use case it's too complicated (many columns -- some filtered,
some not -- skip/scan, all together) to make it obvious to the query
planner that data *is* already sorted.
Or maybe it never occurred to anyone that someone might be trying to do
something like that.
Or (most likely) my understanding of how data is retrieved is plain wrong...

Thank you!
Gerlando


On Mon, Feb 4, 2019 at 1:26 PM Luuk  wrote:

>
> On 3-2-2019 23:29, Gerlando Falauto wrote:
> > IMHO, adding the ORDER BY clause to query 1) above (i.e. query 2) should
> > ideally yield the exact same query plan.
> > In the end adding an ORDER BY clause on the exact same columns of the
> index
> > used to traverse the table, should be easily recognizable.
> > Knowing absolutely nothing about the internals though, I have no idea
> > whether this particular use case has been overlooked, or it would just be
> > unfeasible to handle it.
>
>
> In SQL, when doing a SELECT, the order of the results is undetermined
> (by definition).
>
> If you want/need to results to be ORDERed, you need to add 'ORDER BY'.
> This will always show as an extra step in your QUERY PLAN.
>
> One can never know (for sure) if the output of this is in the correct
> order:
>
> CREATE TABLE test(i primary key);
> INSERT INTO test values(4);
> INSERT INTO test values(2);
> INSERT INTO test values(3);
> INSERT INTO test values(1);
> SELECT i FROM test;
>
> 4
> 2
> 3
> 1
>
> To 'know' it is in the correct order one has to define an ORDER BY to
> specify in which order the data should be returned
>
> SELECT i FROM test order by 2*i+i%2;
>
> 1
> 2
> 3
> 4
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-04 Thread Luuk


On 3-2-2019 23:29, Gerlando Falauto wrote:

IMHO, adding the ORDER BY clause to query 1) above (i.e. query 2) should
ideally yield the exact same query plan.
In the end adding an ORDER BY clause on the exact same columns of the index
used to traverse the table, should be easily recognizable.
Knowing absolutely nothing about the internals though, I have no idea
whether this particular use case has been overlooked, or it would just be
unfeasible to handle it.



In SQL, when doing a SELECT, the order of the results is undetermined 
(by definition).


If you want/need to results to be ORDERed, you need to add 'ORDER BY'. 
This will always show as an extra step in your QUERY PLAN.


One can never know (for sure) if the output of this is in the correct order:

CREATE TABLE test(i primary key);
INSERT INTO test values(4);
INSERT INTO test values(2);
INSERT INTO test values(3);
INSERT INTO test values(1);
SELECT i FROM test;

4
2
3
1

To 'know' it is in the correct order one has to define an ORDER BY to 
specify in which order the data should be returned


SELECT i FROM test order by 2*i+i%2;

1
2
3
4



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-04 Thread Gerlando Falauto
Thanks Keith, I'll give it a go and let you know!
I still don't get how that differs from
2) or 4) below, though.

Thanks again!
Gerlando

Il dom 3 feb 2019, 00:27 Keith Medcalf  ha scritto:

>
> Like this?
>
> SELECT rolling.source1,
>rolling.source2,
>ts,
>value
>   FROM (
> select distinct source1,
> source2
>   from rolling
>  where source1 = 'aaa'
>) as x
>   JOIN rolling
> ON rolling.source1 = x.source1
>AND rolling.source2 = x.source2
>  WHERE ts > 1
>AND ts < 10
> ORDER BY 1,2,3;
>
>
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
> >-Original Message-
> >From: sqlite-users [mailto:sqlite-users-
> >boun...@mailinglists.sqlite.org] On Behalf Of Gerlando Falauto
> >Sent: Saturday, 2 February, 2019 15:20
> >To: SQLite mailing list
> >Subject: Re: [sqlite] Min/Max and skip-scan optimizations
> >
> >Hi,
> >it's me again, struggling with indices once again.
> >What I'm now trying to do is filter on source1 and by range of
> >timestamp.
> >Results should be naturally ordered by source1, source2,ts.
> >
> >1)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE source1 = 'aaa'
> >AND ts > 1 AND ts < 10;
> >
> >QUERY PLAN
> >`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
> >ANY(source2)
> >AND ts>? AND ts >
> >This looks pretty OK on the sample dataset.
> >For some reason though this doesn't really work like that with the
> >real
> >dataset (which has an extra index on ts only),
> >and that seems to kick in instead of the real index.
> >So the query plan comes out completely different, and rows are not
> >naturally sorted by source1, source2, as I'd like.
> >[I know I can use "+ts" and/or drop the "ts" index altogether, but
> >I'm
> >trying to make a point here...]
> >So I add an extra ORDER BY clause:
> >
> >2)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE source1 = 'aaa'
> >  AND ts > 1 AND ts < 1
> >ORDER BY source1, source2, ts;
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
> >ANY(source2)
> >AND ts>? AND ts >`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
> >
> >Here I don't really understand why a TEMP B-TREE is required at all.
> >Apparently, this adds extra processing at the end so I don't start
> >pulling
> >any rows until much later.
> >If the index is used, data would be *naturally* sorted by its key, so
> >I
> >don't really understand.
> >So I recur to a subquery:
> >
> >3)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE ((source1, source2) in
> >(SELECT DISTINCT source1, source2 from rolling where
> >source1='aaa')
> >  AND ts > 1 AND ts < 10)
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
> >AND
> >ts>? AND ts >`--LIST SUBQUERY
> >   `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
> >
> >This also seems to yield the right results, in the right order.
> >But the order looks pretty much like a coincidence to me.
> >So I'd rather explicitly state the sorting I'd like:
> >
> >4)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE ((source1, source2) in
> >(SELECT DISTINCT source1, source2 from rolling where
> >source1='aaa')
> >  AND ts > 1 AND ts < 10)
> >ORDER BY source1, source2, ts
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
> >AND
> >ts>? AND ts >|--LIST SUBQUERY
> >|  `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
> >`--USE TEMP B-TREE FOR ORDER BY
> >
> >Here the b-tree is *NOT* for "RIGHT PART OF" ORDER BY, it seems like
> >some
> >global sorting
> >which looks even worse than case 2.
> >
> >What am I doing wrong? Is this expected?
> >I just don't seem to be able to get what would have been a pretty
> >trivial
> >task with
> >last-century technologies (thinking of Paradox, dBase, CA-Clipper)...
> >
> >Thanks in advance!
> >Gerlando
> >
> >
> >On Mon, Jan 28, 2019 at 9:11 AM Gerlando Falauto
> >
> >wrote:
> >
> >>

Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-04 Thread Gerlando Falauto
Hi Keith,

here's what I get as a query plan for your query:

5)
SELECT rolling.source1,
   rolling.source2,
   ts,
   value
  FROM (
select distinct source1,
source2
  from rolling
 where source1 = 'aaa'
   ) as x
  JOIN rolling
ON rolling.source1 = x.source1
   AND rolling.source2 = x.source2
 WHERE ts > 1
   AND ts < 1
ORDER BY 1,2,3;

QUERY PLAN
|--MATERIALIZE 1
|  `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2)
AND ts>? AND ts 1 AND ts < 1;

QUERY PLAN
`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2)
AND ts>? AND ts wrote:

>
> Like this?
>
> SELECT rolling.source1,
>rolling.source2,
>ts,
>value
>   FROM (
> select distinct source1,
> source2
>   from rolling
>  where source1 = 'aaa'
>) as x
>   JOIN rolling
> ON rolling.source1 = x.source1
>AND rolling.source2 = x.source2
>  WHERE ts > 1
>AND ts < 10
> ORDER BY 1,2,3;
>
>
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
> >-Original Message-
> >From: sqlite-users [mailto:sqlite-users-
> >boun...@mailinglists.sqlite.org] On Behalf Of Gerlando Falauto
> >Sent: Saturday, 2 February, 2019 15:20
> >To: SQLite mailing list
> >Subject: Re: [sqlite] Min/Max and skip-scan optimizations
> >
> >Hi,
> >it's me again, struggling with indices once again.
> >What I'm now trying to do is filter on source1 and by range of
> >timestamp.
> >Results should be naturally ordered by source1, source2,ts.
> >
> >1)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE source1 = 'aaa'
> >AND ts > 1 AND ts < 10;
> >
> >QUERY PLAN
> >`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
> >ANY(source2)
> >AND ts>? AND ts >
> >This looks pretty OK on the sample dataset.
> >For some reason though this doesn't really work like that with the
> >real
> >dataset (which has an extra index on ts only),
> >and that seems to kick in instead of the real index.
> >So the query plan comes out completely different, and rows are not
> >naturally sorted by source1, source2, as I'd like.
> >[I know I can use "+ts" and/or drop the "ts" index altogether, but
> >I'm
> >trying to make a point here...]
> >So I add an extra ORDER BY clause:
> >
> >2)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE source1 = 'aaa'
> >  AND ts > 1 AND ts < 1
> >ORDER BY source1, source2, ts;
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
> >ANY(source2)
> >AND ts>? AND ts >`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
> >
> >Here I don't really understand why a TEMP B-TREE is required at all.
> >Apparently, this adds extra processing at the end so I don't start
> >pulling
> >any rows until much later.
> >If the index is used, data would be *naturally* sorted by its key, so
> >I
> >don't really understand.
> >So I recur to a subquery:
> >
> >3)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE ((source1, source2) in
> >(SELECT DISTINCT source1, source2 from rolling where
> >source1='aaa')
> >  AND ts > 1 AND ts < 10)
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
> >AND
> >ts>? AND ts >`--LIST SUBQUERY
> >   `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
> >
> >This also seems to yield the right results, in the right order.
> >But the order looks pretty much like a coincidence to me.
> >So I'd rather explicitly state the sorting I'd like:
> >
> >4)
> >
> >SELECT source1, source2, ts, value
> >FROM rolling
> >WHERE ((source1, source2) in
> >(SELECT DISTINCT source1, source2 from rolling where
> >source1='aaa')
> >  AND ts > 1 AND ts < 10)
> >ORDER BY source1, source2, ts
> >
> >QUERY PLAN
> >|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
> >AND
> >ts>? AND ts >|--LIST SUBQUERY
> >|  `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
> >`--USE TEMP B-TREE FOR ORDER BY
> >
> >Here the b-tree is *NOT* for &quo

Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-03 Thread Simon Slavin
On 3 Feb 2019, at 9:52am, Gerlando Falauto  wrote:

> I do want them sorted, and I also want the whole (huge) dataset to be
> processable without having to store it all in memory or temp files.
> Sounds like the whole purpose of an index, doesn't it?
> I do know SQL is all about the result, not how it's obtained, though.

Absolutely.  You should be fine.

Try using the procedure in my previous message (the one which mentions ANALYZE) 
to find out the most useful index.

Once you've found that, you can delete the other index.  Then you've solved the 
problem and can move on.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-03 Thread Gerlando Falauto
Simon, Tim,

Il sab 2 feb 2019, 23:40 Simon Slavin  ha scritto:

> On 2 Feb 2019, at 10:19pm, Gerlando Falauto 
> wrote:
>
> > Results should be naturally ordered by source1, source2,ts.
>
> [Sorry, I missed this the first time.  Thanks, Tim.]
>
> Sorry, no.  You're making assumptions about how SQLite works internally.
> If you want your results sorted, ask for them sorted.  If you don't, don't.
>

I do want them sorted, and I also want the whole (huge) dataset to be
processable without having to store it all in memory or temp files.
Sounds like the whole purpose of an index, doesn't it?
I do know SQL is all about the result, not how it's obtained, though.


> Note that if you don't ask for them sorted and they come out sorted by
> good luck, there's no guarantee that they'll be sorted if you put different
> data in the table, or the same data with the rows in a different order, or
> if you update to a later version of SQLite.
>

I do realize that, that's why I'm not happy with that seems to do the trick
without an explicit order by clause.

Thank you for your help!

Gerlando


>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-02 Thread Keith Medcalf

Like this?

SELECT rolling.source1,
   rolling.source2,
   ts,
   value
  FROM (
select distinct source1,
source2
  from rolling
 where source1 = 'aaa'
   ) as x
  JOIN rolling
ON rolling.source1 = x.source1
   AND rolling.source2 = x.source2
 WHERE ts > 1
   AND ts < 10
ORDER BY 1,2,3;


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Gerlando Falauto
>Sent: Saturday, 2 February, 2019 15:20
>To: SQLite mailing list
>Subject: Re: [sqlite] Min/Max and skip-scan optimizations
>
>Hi,
>it's me again, struggling with indices once again.
>What I'm now trying to do is filter on source1 and by range of
>timestamp.
>Results should be naturally ordered by source1, source2,ts.
>
>1)
>
>SELECT source1, source2, ts, value
>FROM rolling
>WHERE source1 = 'aaa'
>AND ts > 1 AND ts < 10;
>
>QUERY PLAN
>`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
>ANY(source2)
>AND ts>? AND ts
>This looks pretty OK on the sample dataset.
>For some reason though this doesn't really work like that with the
>real
>dataset (which has an extra index on ts only),
>and that seems to kick in instead of the real index.
>So the query plan comes out completely different, and rows are not
>naturally sorted by source1, source2, as I'd like.
>[I know I can use "+ts" and/or drop the "ts" index altogether, but
>I'm
>trying to make a point here...]
>So I add an extra ORDER BY clause:
>
>2)
>
>SELECT source1, source2, ts, value
>FROM rolling
>WHERE source1 = 'aaa'
>  AND ts > 1 AND ts < 1
>ORDER BY source1, source2, ts;
>
>QUERY PLAN
>|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND
>ANY(source2)
>AND ts>? AND ts`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
>
>Here I don't really understand why a TEMP B-TREE is required at all.
>Apparently, this adds extra processing at the end so I don't start
>pulling
>any rows until much later.
>If the index is used, data would be *naturally* sorted by its key, so
>I
>don't really understand.
>So I recur to a subquery:
>
>3)
>
>SELECT source1, source2, ts, value
>FROM rolling
>WHERE ((source1, source2) in
>(SELECT DISTINCT source1, source2 from rolling where
>source1='aaa')
>  AND ts > 1 AND ts < 10)
>
>QUERY PLAN
>|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
>AND
>ts>? AND ts`--LIST SUBQUERY
>   `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
>
>This also seems to yield the right results, in the right order.
>But the order looks pretty much like a coincidence to me.
>So I'd rather explicitly state the sorting I'd like:
>
>4)
>
>SELECT source1, source2, ts, value
>FROM rolling
>WHERE ((source1, source2) in
>(SELECT DISTINCT source1, source2 from rolling where
>source1='aaa')
>  AND ts > 1 AND ts < 10)
>ORDER BY source1, source2, ts
>
>QUERY PLAN
>|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=?
>AND
>ts>? AND ts|--LIST SUBQUERY
>|  `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=?)
>`--USE TEMP B-TREE FOR ORDER BY
>
>Here the b-tree is *NOT* for "RIGHT PART OF" ORDER BY, it seems like
>some
>global sorting
>which looks even worse than case 2.
>
>What am I doing wrong? Is this expected?
>I just don't seem to be able to get what would have been a pretty
>trivial
>task with
>last-century technologies (thinking of Paradox, dBase, CA-Clipper)...
>
>Thanks in advance!
>Gerlando
>
>
>On Mon, Jan 28, 2019 at 9:11 AM Gerlando Falauto
>
>wrote:
>
>> YES! Thank you!
>> Many thanks for the ".eqp full" tip also, that really explains a
>lot
>> (though I don't really understand any of it yet).
>>
>> Have a great day!
>> Gerlando
>>
>>
>> On Mon, Jan 28, 2019 at 6:50 AM Keith Medcalf 
>wrote:
>>
>>>
>>> Do you perhaps want this:
>>>
>>> select source1,
>>>source2,
>>>(
>>> select min(ts)
>>>   from rolling
>>>  where source1 = x.source1
>>>and source2 = x.source2
>>>)
>>>   from (
>>> select distinct source1,
>>> source2
>>>   from rolling
>>>) as x;
>>>
>>> SQLite version 3.27.0 2019-01-28 00:42:06
>>&g

Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-02 Thread Simon Slavin
On 2 Feb 2019, at 10:19pm, Gerlando Falauto  wrote:

> Results should be naturally ordered by source1, source2,ts.

[Sorry, I missed this the first time.  Thanks, Tim.]

Sorry, no.  You're making assumptions about how SQLite works internally.  If 
you want your results sorted, ask for them sorted.  If you don't, don't.

Note that if you don't ask for them sorted and they come out sorted by good 
luck, there's no guarantee that they'll be sorted if you put different data in 
the table, or the same data with the rows in a different order, or if you 
update to a later version of SQLite.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-02 Thread Tim Streater
On 02 Feb 2019, at 22:19, Gerlando Falauto  wrote:

> What I'm now trying to do is filter on source1 and by range of timestamp.
> Results should be naturally ordered by source1, source2,ts.

Not unless you use an ORDER BY.


-- 
Cheers  --  Tim
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-02 Thread Simon Slavin
On 2 Feb 2019, at 10:19pm, Gerlando Falauto  wrote:

> SELECT source1, source2, ts, value
> FROM rolling
> WHERE source1 = 'aaa'
> AND ts > 1 AND ts < 10;
> 
> [...snip...]

> So I add an extra ORDER BY clause:

Concentrate on the results you want.  Don't try to 'game' the library.  You 
don't understand how it thinks, and nor do I.  Do you want the things in a 
certain order ?  If not, don't ask for it.

To find the best results for the above query,

1) Put a typical amount of typical data into the table
2) Create two indexes:
first index (source1, ts)
second index (ts, source1)
3) Execute the command ANALYZE.
4) Execute the query you actually want (presumably without ORDER BY) using 
EXPLAIN QUERY PLAN.

Find what it does and get back to us.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Min/Max and skip-scan optimizations

2019-02-02 Thread Gerlando Falauto
Hi,
it's me again, struggling with indices once again.
What I'm now trying to do is filter on source1 and by range of timestamp.
Results should be naturally ordered by source1, source2,ts.

1)

SELECT source1, source2, ts, value
FROM rolling
WHERE source1 = 'aaa'
AND ts > 1 AND ts < 10;

QUERY PLAN
`--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2)
AND ts>? AND ts 1 AND ts < 1
ORDER BY source1, source2, ts;

QUERY PLAN
|--SEARCH TABLE rolling USING INDEX sources (ANY(source1) AND ANY(source2)
AND ts>? AND ts 1 AND ts < 10)

QUERY PLAN
|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=? AND
ts>? AND ts 1 AND ts < 10)
ORDER BY source1, source2, ts

QUERY PLAN
|--SEARCH TABLE rolling USING INDEX sources (source1=? AND source2=? AND
ts>? AND ts
wrote:

> YES! Thank you!
> Many thanks for the ".eqp full" tip also, that really explains a lot
> (though I don't really understand any of it yet).
>
> Have a great day!
> Gerlando
>
>
> On Mon, Jan 28, 2019 at 6:50 AM Keith Medcalf  wrote:
>
>>
>> Do you perhaps want this:
>>
>> select source1,
>>source2,
>>(
>> select min(ts)
>>   from rolling
>>  where source1 = x.source1
>>and source2 = x.source2
>>)
>>   from (
>> select distinct source1,
>> source2
>>   from rolling
>>) as x;
>>
>> SQLite version 3.27.0 2019-01-28 00:42:06
>> Enter ".help" for usage hints.
>> Connected to a transient in-memory database.
>> Use ".open FILENAME" to reopen on a persistent database.
>> sqlite> .timer on
>> sqlite> CREATE TABLE `rolling` (
>>...> `source1`TEXT NOT NULL,
>>...> `source2`TEXT NOT NULL,
>>...> `ts`INTEGER NOT NULL,
>>...> `value`TEXT
>>...> );
>> Run Time: real 0.002 user 0.00 sys 0.00
>> sqlite>
>> sqlite> CREATE INDEX `sources` ON `rolling` (
>>...> `source1`,
>>...> `source2`,
>>...> `ts`
>>...> );
>> Run Time: real 0.001 user 0.00 sys 0.00
>> sqlite>
>> sqlite> INSERT INTO rolling
>>...> WITH RECURSIVE
>>...>   src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb")
>> ),
>>...>   src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2")
>> UNION ALL
>>...> VALUES("X3") UNION ALL VALUES("X4") ),
>>...>   cnt( ts, value) AS (
>>...>   VALUES( 0, "")
>>...> UNION ALL
>>...>   SELECT ts+1, value FROM cnt LIMIT 100)
>>...>
>>...> select src1.source1, src2.source2, cnt.* from src1, src2, cnt;
>> Run Time: real 8.920 user 8.843750 sys 0.078125
>> sqlite>
>> sqlite> analyze;
>> Run Time: real 1.285 user 1.281250 sys 0.00
>> sqlite> .eqp full
>> sqlite>
>> sqlite> select source1,
>>...>source2,
>>...>(
>>...> select min(ts)
>>...>   from rolling
>>...>  where source1 = x.source1
>>...>and source2 = x.source2
>>...>)
>>...>   from (
>>...> select distinct source1,
>>...> source2
>>...>   from rolling
>>...>) as x;
>> QUERY PLAN
>> |--CO-ROUTINE 2
>> |  `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320 rows)
>> |--SCAN SUBQUERY 2 AS x (~7864320 rows)
>> `--CORRELATED SCALAR SUBQUERY 1
>>`--SEARCH TABLE rolling USING COVERING INDEX sources (source1=? AND
>> source2=?) (~983040 rows)
>> addr  opcode p1p2p3p4 p5  comment
>>   -        -  --  -
>> 0 Init   0 64000  Start at 64
>> 1 InitCoroutine  1 23200  x
>> 2 Null   1 4 008  r[4]=NULL
>> 3 OpenRead   4 3 0 k(4)   00  root=3 iDb=0;
>> sources
>> 4 ColumnsUsed4 0 0 3  00
>> 5 Explain5 0 0 SCAN TABLE rolling USING COVERING
>> INDEX sources (~7864320 rows)  00
>> 6 Noop   0 0 000  Begin
>> WHERE-loop0: rolling
>> 7 Rewind 4 212 0  00
>> 8 Noop   0 0 000  Begin
>> WHERE-core
>> 9 Column 4 0 200
>> r[2]=rolling.source1
>> 10Column 4 1 300
>> r[3]=rolling.source2
>> 11Ne 2 134 (BINARY)   80  if
>> r[4]!=r[2] goto 13
>> 12Eq 3 205 (BINARY)   80  if
>> r[5]==r[3] goto 20
>> 13Copy   2 4 100
>> r[4..5]=r[2..3]
>> 14Yield  1 0 000
>> 15Noop   0 0 000  End
>> WHERE-core
>> 16Column 4 0 600  r[6]=
>> 17

Re: [sqlite] Min/Max and skip-scan optimizations

2019-01-28 Thread Gerlando Falauto
YES! Thank you!
Many thanks for the ".eqp full" tip also, that really explains a lot
(though I don't really understand any of it yet).

Have a great day!
Gerlando


On Mon, Jan 28, 2019 at 6:50 AM Keith Medcalf  wrote:

>
> Do you perhaps want this:
>
> select source1,
>source2,
>(
> select min(ts)
>   from rolling
>  where source1 = x.source1
>and source2 = x.source2
>)
>   from (
> select distinct source1,
> source2
>   from rolling
>) as x;
>
> SQLite version 3.27.0 2019-01-28 00:42:06
> Enter ".help" for usage hints.
> Connected to a transient in-memory database.
> Use ".open FILENAME" to reopen on a persistent database.
> sqlite> .timer on
> sqlite> CREATE TABLE `rolling` (
>...> `source1`TEXT NOT NULL,
>...> `source2`TEXT NOT NULL,
>...> `ts`INTEGER NOT NULL,
>...> `value`TEXT
>...> );
> Run Time: real 0.002 user 0.00 sys 0.00
> sqlite>
> sqlite> CREATE INDEX `sources` ON `rolling` (
>...> `source1`,
>...> `source2`,
>...> `ts`
>...> );
> Run Time: real 0.001 user 0.00 sys 0.00
> sqlite>
> sqlite> INSERT INTO rolling
>...> WITH RECURSIVE
>...>   src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb") ),
>...>   src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2")
> UNION ALL
>...> VALUES("X3") UNION ALL VALUES("X4") ),
>...>   cnt( ts, value) AS (
>...>   VALUES( 0, "")
>...> UNION ALL
>...>   SELECT ts+1, value FROM cnt LIMIT 100)
>...>
>...> select src1.source1, src2.source2, cnt.* from src1, src2, cnt;
> Run Time: real 8.920 user 8.843750 sys 0.078125
> sqlite>
> sqlite> analyze;
> Run Time: real 1.285 user 1.281250 sys 0.00
> sqlite> .eqp full
> sqlite>
> sqlite> select source1,
>...>source2,
>...>(
>...> select min(ts)
>...>   from rolling
>...>  where source1 = x.source1
>...>and source2 = x.source2
>...>)
>...>   from (
>...> select distinct source1,
>...> source2
>...>   from rolling
>...>) as x;
> QUERY PLAN
> |--CO-ROUTINE 2
> |  `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320 rows)
> |--SCAN SUBQUERY 2 AS x (~7864320 rows)
> `--CORRELATED SCALAR SUBQUERY 1
>`--SEARCH TABLE rolling USING COVERING INDEX sources (source1=? AND
> source2=?) (~983040 rows)
> addr  opcode p1p2p3p4 p5  comment
>   -        -  --  -
> 0 Init   0 64000  Start at 64
> 1 InitCoroutine  1 23200  x
> 2 Null   1 4 008  r[4]=NULL
> 3 OpenRead   4 3 0 k(4)   00  root=3 iDb=0;
> sources
> 4 ColumnsUsed4 0 0 3  00
> 5 Explain5 0 0 SCAN TABLE rolling USING COVERING
> INDEX sources (~7864320 rows)  00
> 6 Noop   0 0 000  Begin
> WHERE-loop0: rolling
> 7 Rewind 4 212 0  00
> 8 Noop   0 0 000  Begin
> WHERE-core
> 9 Column 4 0 200
> r[2]=rolling.source1
> 10Column 4 1 300
> r[3]=rolling.source2
> 11Ne 2 134 (BINARY)   80  if
> r[4]!=r[2] goto 13
> 12Eq 3 205 (BINARY)   80  if
> r[5]==r[3] goto 20
> 13Copy   2 4 100
> r[4..5]=r[2..3]
> 14Yield  1 0 000
> 15Noop   0 0 000  End
> WHERE-core
> 16Column 4 0 600  r[6]=
> 17Column 4 1 700  r[7]=
> 18SeekGT 4 216 2  00  key=r[6..7]
> 19  Goto   1 8 000
> 20Next   4 8 001
> 21Noop   0 0 000  End WHERE-loop0:
> rolling
> 22EndCoroutine   1 0 000
> 23Explain230 0 SCAN SUBQUERY 2 AS x (~7864320
> rows)  00
> 24Noop   0 0 000  Begin
> WHERE-loop0: x
> 25InitCoroutine  1 0 200
> 26  Yield  1 62000  next row of x
> 27  Noop   0 0 000  Begin
> WHERE-core
> 28  Copy   2 9 000  r[9]=r[2];
> x.source1
> 29  Copy   3 10000  

Re: [sqlite] Min/Max and skip-scan optimizations

2019-01-27 Thread Keith Medcalf

Do you perhaps want this:

select source1, 
   source2, 
   (
select min(ts) 
  from rolling 
 where source1 = x.source1 
   and source2 = x.source2
   ) 
  from (
select distinct source1, 
source2 
  from rolling
   ) as x;

SQLite version 3.27.0 2019-01-28 00:42:06
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .timer on
sqlite> CREATE TABLE `rolling` (
   ...> `source1`TEXT NOT NULL,
   ...> `source2`TEXT NOT NULL,
   ...> `ts`INTEGER NOT NULL,
   ...> `value`TEXT
   ...> );
Run Time: real 0.002 user 0.00 sys 0.00
sqlite>
sqlite> CREATE INDEX `sources` ON `rolling` (
   ...> `source1`,
   ...> `source2`,
   ...> `ts`
   ...> );
Run Time: real 0.001 user 0.00 sys 0.00
sqlite>
sqlite> INSERT INTO rolling
   ...> WITH RECURSIVE
   ...>   src1( source1 ) AS ( VALUES("aaa") UNION ALL VALUES("bbb") ),
   ...>   src2( source2 ) AS ( VALUES("X1") UNION ALL VALUES("X2") UNION ALL
   ...> VALUES("X3") UNION ALL VALUES("X4") ),
   ...>   cnt( ts, value) AS (
   ...>   VALUES( 0, "")
   ...> UNION ALL
   ...>   SELECT ts+1, value FROM cnt LIMIT 100)
   ...>
   ...> select src1.source1, src2.source2, cnt.* from src1, src2, cnt;
Run Time: real 8.920 user 8.843750 sys 0.078125
sqlite>
sqlite> analyze;
Run Time: real 1.285 user 1.281250 sys 0.00
sqlite> .eqp full
sqlite>
sqlite> select source1,
   ...>source2,
   ...>(
   ...> select min(ts)
   ...>   from rolling
   ...>  where source1 = x.source1
   ...>and source2 = x.source2
   ...>)
   ...>   from (
   ...> select distinct source1,
   ...> source2
   ...>   from rolling
   ...>) as x;
QUERY PLAN
|--CO-ROUTINE 2
|  `--SCAN TABLE rolling USING COVERING INDEX sources (~7864320 rows)
|--SCAN SUBQUERY 2 AS x (~7864320 rows)
`--CORRELATED SCALAR SUBQUERY 1
   `--SEARCH TABLE rolling USING COVERING INDEX sources (source1=? AND 
source2=?) (~983040 rows)
addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Init   0 64000  Start at 64
1 InitCoroutine  1 23200  x
2 Null   1 4 008  r[4]=NULL
3 OpenRead   4 3 0 k(4)   00  root=3 iDb=0; sources
4 ColumnsUsed4 0 0 3  00
5 Explain5 0 0 SCAN TABLE rolling USING COVERING INDEX 
sources (~7864320 rows)  00
6 Noop   0 0 000  Begin WHERE-loop0: 
rolling
7 Rewind 4 212 0  00
8 Noop   0 0 000  Begin WHERE-core
9 Column 4 0 200  
r[2]=rolling.source1
10Column 4 1 300  
r[3]=rolling.source2
11Ne 2 134 (BINARY)   80  if r[4]!=r[2] 
goto 13
12Eq 3 205 (BINARY)   80  if r[5]==r[3] 
goto 20
13Copy   2 4 100  r[4..5]=r[2..3]
14Yield  1 0 000
15Noop   0 0 000  End WHERE-core
16Column 4 0 600  r[6]=
17Column 4 1 700  r[7]=
18SeekGT 4 216 2  00  key=r[6..7]
19  Goto   1 8 000
20Next   4 8 001
21Noop   0 0 000  End WHERE-loop0: 
rolling
22EndCoroutine   1 0 000
23Explain230 0 SCAN SUBQUERY 2 AS x (~7864320 rows)  00
24Noop   0 0 000  Begin WHERE-loop0: x
25InitCoroutine  1 0 200
26  Yield  1 62000  next row of x
27  Noop   0 0 000  Begin WHERE-core
28  Copy   2 9 000  r[9]=r[2]; x.source1
29  Copy   3 10000  r[10]=r[3]; 
x.source2
30  Null   0 1212   00  r[12..12]=NULL; 
Init subquery result
31  Integer1 13000  r[13]=1; LIMIT 
counter
32  Null   0 1415   00  r[14..15]=NULL
33  OpenRead   5 3 0 k(4)   00  root=3 iDb=0; 
sources
34  ColumnsUsed5 0 0 7  00
35  Explain35