Re: [sqlite] Building multiple indexes in one scan over a table

Markus Schaber Tue, 10 Jun 2014 23:49:28 -0700

Hi,

> Von: Fabian Giesen
> I'm working on a profiler that uses SQLite to store and index the
> results.
> 
> Due to the nature of the task, the workload ends up splitting into
> essentially three phases:
> 
> 1. Bulk insertion of relatively large amounts of data (low hundreds
> of megabytes to tens of gigabytes); a typical profile generates on
> the order of 10MB of sampling data per second.
> 
> 2. Indexing. The first phase normally runs without any indexes on the
> sample tables, to maximize insertion throughput. In the second phase
> we create the indexes we need to retrieve the samples efficiently.
> (We found "batch-indexing" after the fact to generally lead to
> shorter overall processing times.)
> 
> 3. Browse the results. Once the samples are indexed, the user can
> look around, analyze the results using queries, and so forth. This is
> essentially read-only.
> 
> Phase 2 is crucial for quick retrieval, but at the same time it
> essentially amounts to wait time for the user, so we'd like to get it
> over with as quickly as possible. Now, looking at what happens in
> there, I noticed that we often end up creating several indexes on the
> same table back-to-back:
> 
>    CREATE INDEX samples_by_time ON samples(time);
>    CREATE INDEX samples_by_name ON samples(name_id);
>    CREATE INDEX samples_by_type ON samples(type_id);
> 
> Each of these statements ends up scanning over the entire table once.
> Since the tables in question are quite large, I would love to have
> some way to create multiple indexes in one pass. A cursory
> examination of the VDBE code produced for the CREATE INDEX statements
> indicates that SQLite's backend should be capable of doing so (though
> I'm no expert), but I don't see a way to express what I want in
> standard SQL.
> 
> So, is there any way to create multiple indexes for the same table in
> one operation in SQLite? If not, how hard would it be to add?


If one of your columns is unique, you could declare this column as primary key 
and use "WITHOUT ROWID".

If the column is also of type int, you could use the "INTEGER PRIMARY KEY" 
optimization which has a similar effect.

This saves the time of creating the index altogether, saves a few bytes for the 
row id column, and the table will be pre-ordered by said value.

But be sure you read the caveats in the documentation before proceeding.
https://www.sqlite.org/withoutrowid.html
https://www.sqlite.org/lang_createtable.html#rowid
 
I'm also not sure how this affects insertion time, but if the value is growing 
monotoneously, it should not differ too much from the standard auto-generated 
ROWID behavior which has the same properties.


Best regards

Markus Schaber

CODESYS(r) a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: m.scha...@codesys.com | Web: http://www.codesys.com | CODESYS store: 
http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade 
register: Kempten HRB 6186 | Tax ID No.: DE 167014915

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Building multiple indexes in one scan over a table

Reply via email to