Re: [sqlite] Min/Max and skip-scan optimizations

Keith Medcalf Tue, 05 Feb 2019 16:27:05 -0800

On Tuesday, 5 February, 2019 15:12, Gerlando Falauto wrote:

> I could've just used directories and logfiles instead of abusing
> a relational database but I just thought it would more convenient
> to issue a query and use a cursor.


Well, the "abusing a relational database" is the correct terminology because 
you have not normalized the data before storing it.  Failure to normalize 
causes all sorts of problems when you attempt to "retrieve" and "store" data -- 
if the data is not normalized you should not really expect anything more 
performant than would be obtained by simply sequentially scanning sequential 
log files.  Also, relational databases do not have cursors (cursors are an 
illusion).  Relational databases operate on "sets" of data all at one get-go.  
The appearance of "sequential execution row by row" is merely an artifice of 
the current limits of computer technology, and what some refer to a "cursors" 
are merely implementation details or "magic programming" allowing access to a 
"set" in a row by row fashion.

You probably want separate tables for "source1" and for "source2" because these 
are not keys of the events, they are attributes of the events, and contain 
duplicate data.  Similarly the timestamp is an attribute of an event, not a key 
of an event.  The only true key of the event is a pseudo-key conveying the 
order in which the events happened and nothing more (for which a simple record 
number or "integer primary key") will suffice.

Now you have to decide whether or not "source2" is dependant on "source1" or 
not.  This is rather simple.  If the same "source2" can occur in combination 
with several "source1" then it is independant.  (This also depends on what you 
want to do with the data after you have it.)

So now you have the following minimally normalized schema:

create table Source1
(
  id integer primary key,
  name text collate nocase unique
);
create table Source2
(
  id integer primary key,
  name text collate nocase unique
);
create table Events
(
  id integer primary key,
  Source1id integer not null references Source1,
  Source2id integer not null references Source2,
  ts real not null,
  --- other data items ---
);
create index EventSource1 on Events (Source1id, Source2id);
create index EventSource2 on Events (Source2id, Source1id);
create index EventTS on Events (ts);

So now if you want all the combinations of source1 and source2 it is simply:

  select source1.name, source2.name 
    from source1, source2
order by 1, 2;

If you only want them where there are events then:

  select source1.name, source2.name 
    from source1, source2
   where exists (select * 
                   from events 
                  where events.source1id == source1.id 
                    and events.source2id == source2.id)
order by 1, 2;

If you want all the stuff between two ts you can do:

select source1.name, source2.name, events.*
  from source1, source2, events
 where source1.id == events.source1id
   and source2.id == events.source2id
   and events.id between coalesce((select max(id)
                                     from events
                                    where ts <= :StartTime), 
                                  (select min(id)
                                     from events))
                     and coalesce((select min(id)
                                     from events
                                    where ts >= :StartTime), 
                                  (select max(id)
                                     from events))
order by events.id;

If you only want the data for source2 = 'GATE' you can do the same and add:

   and source2.name == 'GATE'

or probably more efficient:

   and events.source2id == (select id from source2 where name == 'GATE')

and even other combinations:

   and events.source1id in (select id from source1 where name in ('ABA', 'AXX'))
   and events.source2id in (select id from source2 where name == 'GATE')

and so on and so forth, and let the query optimizer decide the best way to 
actually retrieve the data you requested.


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.




_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Min/Max and skip-scan optimizations

Reply via email to