Re: [sqlite] Tracking item history using SQLite

Rob Willett Sat, 31 Aug 2019 03:25:19 -0700

Hi,

We have a very similar system that captures traffic incident informationsuch as accidents, roadworks, traffic jams and sends personalisedinformation to drivers based on their routes.

We realised after a few years that our original design for the databasewas inefficient as we had a single table that was about 60GB, circa 500Mrows, that consisted of traffic incidents and uptakes. The inefficiencywas our design and not SQLite.

This inefficiency showed when we used to run a daily report thatsummarised the traffic information for historical trends, e.g. we wantedto know the average time for a road accident to be cleared on a stretchof road on a specific day or time of day. This report mean we read incirca 500M rows and 60GB of data each night and took circa 90 mins torun as we did a lot of calculations.

We were not able to change the structure of the database too much (as itwas too far down the line). So we spent a long time looking at the datawith cold, wet towels on our foreheads in darkened rooms :) After weanalysed the data we realised that most of our 'updates' were notactually updates at all an all were were actually interested in was thestart and end times of incidents, all the 'stuff' in the middle was justnoise. The problem we had was that disruptions could last a long timeand there was no actual end of incident marker, just that there wasn'tanother disruption with that unique identifier and a later timestamp.This sounds similar to your situation. Basically we constructed a simplestate model based on time.

What we now do is to each night we run a Perl script each night toremove anything that is between the first and end incidents. We knowthat the end incident may move on the next day, but at that point intime it is still the last/end/most recent indicator. Our data isstructured around time so we always have a time of insertion indicatorin the field as things may not be ordered by row id.

Once we had this insight, we took our database down from 60GB to 800MBand the report that ran from 90 mins to 90 secs. We have to give creditto this mailing list for the patient and courteous way that helped andassisted with our often stupid questions.


The lessons we learnt from this are:

1. Get your database schema right from the go. We didn't. In hindsightwe should have spent longer looking at this.

2. Put in more information than you think you may need just in case. Wedid that. We put time information in at per second granularity.

3. Don't worry too much about disk space unless you are on a constraineddevice. In hindsight we could have stored even more data :)

4. Think hard about solutions that are simple. We love simplicity asthis makes it easier to go back and change things. Too many dependencieswould be too difficult to unpick. Keeping it simple also works for us aswe're not experts :)

5. SQLite seems to be able to do anything we want it to. We know the'limitations' of it not being a full client/server database, just to beclear, we do not consider this a limitation at all but rather a virtue.Other people seem worried about the 'lack' of some datatypes, we domasses of data and date conversations as needed and it's never been aspeed issue or any issue.

6. Ask this group for help. We realised that the more information weprovided in a clear and concise manner in our help email, the more helpwe got, it was a virtuous circle. The sum help of the people in thisgroup far, far exceeds our own knowledge. As a rule of thumb, if we askfor help, we expect it to take 1-2 hours as a minimum for us to writethe email. That may be because we're not SQL experts. Also follow upevery email as if people have taken the time to reply to us, theydeserve a reply themselves. As we are English, we could end up in athanks for the reply endless loop :) The group has an exceptionallysignal to noise ratio and is invariably courteous.

I would be astonished if you can't get SQLite to do what you want to do.We have never managed to touch the sides of the system and suspect wenever will.

More than happy to answer more questions as this group helped us andit's only fair we offer the help back. I will state that we are not SQL(or SQLite) experts :)


Rob

On 30 Aug 2019, at 21:44, Random Coder wrote:

First off, if this sort of "code review" style question is
inappropriate for this list, please feel free to reply to me directly
and tell me to stop, and I'll know to not do this again.

That said, I have a question on the use of SQLite.  At the risk of
falling into the XY problem, I'll give some details on what i'm doing,
and then how I'm doing it.  My basic question is if what I'm doing is
valid, and if I'm doing anything needlessly wasteful.

I have a system monitoring events (and I know I'm being vague on the
exact nature of these events, I can't provide too many details, sorry.
Please try to accept what I say as given about them).  The events have
IDs that are generated externally, they're 30-40 ascii characters
long, appear random, and known to be unique by external means for a
given event.  For the purposes of this particular problem, the only
thing I care about tracking is when I first saw an event, and the last
time I saw it.  For better or worse, this ecosystem already thinks
about timestamps as the number of minutes since a specific epoch, and
is used to treating all time values as an integer in that space, I'm
doing the same here.

So, I have a RESTful server written in Python, using APSW to create a
simple SQLite database:

    CREATE TABLE IF NOT EXISTS
        event(
            event_id TEXT PRIMARY KEY,
            first_seen INTEGER,
            last_seen INTEGER
        ) WITHOUT ROWID;

Every time a new event comes in (they might be very out of order), Ido a:


    INSERT INTO event(event_id, first_seen, last_seen) VALUES(?, ?, ?)
    ON CONFLICT(event_id) DO UPDATE SET
        first_seen = MIN(excluded.first_seen, hashes.first_seen),
        last_seen = MAX(excluded.last_seen, hashes.last_seen);

To create the record for the event if it's new, or possibly update an
existing one with new values.  To give a sense of scale, I have around
5 billion events stored right now for the past couple of years in a
250gb database, and I see around 20 million per day, some small
percentage of those are new.

The important thing I can do for users is pull up reports.  The report
is roughly a summary of how old events are (when they were first seen,
and how long they've been seen for).  Outliers are highlighted, as are
events that haven't been seen at all.  The user will provide around
ten thousand event IDs, the majority of them, approaching 99%, will
exist in my database.  When the user requests a report, I create an in
memory database:

   ATTACH ':memory:' AS mem_db;
   CREATE TABLE mem_db.valid(event_id TEXT PRIMARY KEY);

And populate that table with the events the user is interested in.
I'm doing this since I won't get the list of items in one list, it's
built up over some minutes.  If the system running dies in the middle
of a request, it's OK to start over.  Then I run:

    SELECT
        mem_db.valid.event_id,
        event.first_seen,
        event.last_seen
    FROM
        mem_db.valid
    LEFT JOIN event ON
        event.event_id = mem_db.valid.event_id;

And gather up the results and pretty them up for the user.

Does all of this seem valid?  It works, so I'm OK with it, but I'm far
from a SQLite expert, and I want to know if I'm going to be backing
myself into a corner or otherwise torturing things that should be done
differently.  Or, if the answer is: "Don't use SQLite for that",
that's fine too, I'll start looking at other options.

Thanks for any feedback.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Tracking item history using SQLite

Reply via email to