On 2015-06-10 02:45 PM, Igor Stassiy wrote: > Hello, > > I have a question about controlling the size of sqlite database containing > records of the form id -> payload where payload has fixed size and id is an > integer column. > > I am developing a simple storage for points in an sqlite database, the > problem is that space consumption on disk is slightly unsatisfying. Here is > what I mean: > > A program to generate some data for the database: > > #include <cstdlib>#include <cstdint>#include <iostream> > using namespace std; > > int main() { > int n; > cin >> n; > > for(int i = 0; i < n; i++) { > double lat = 180*(rand()*1.0/RAND_MAX); > double lon = 180*(rand()*1.0/RAND_MAX); > int64_t id = rand()*1LL*rand(); > cout << id << "," << lat << "," << lon << "\n"; > } > > return 0;} > > To import the data run: > > g++ filldb.cpp > echo 100000 | ./a.out > db.text > sqlite3 test.db > sqlite> create table points(id UNSIGNED INT, lat REAL, lon REAL); > sqlite> .separator , > sqlite> .mode csv > sqlite> .import db.text points > > The size of the database right after this shows 3.3M, however the "real" > size of data is 100000*(8 bytes)*3 = 2.4M and after > > sqlite> create index points_index on points(id); > > the database size more than doubles to 5.1M > > Changing id UNSIGNED INT to INTEGER PRIMARY KEY does help, but is there > anything else to be done, in particular when the database is static (we > know there wont be any inserts)?
A database structure has overheads and SQLite doesn't stick to only saving the bits that are needed (since this is never a requirement) - so you won't ever get the same space utilization as a flat-file might have. You can check exact space usage with the analyzer utility from the downloads page. Some things to note: Changing the id type to INTEGER PRIMARY KEY helps because that makes it an alias for the rowid which is the table's own index of rows and is always Integer. In other words, you remove the need for that column to contain its own set of data, which makes the space needed smaller. Secondly, you may also use a table without a rowid, which would look like this: create table points(id INT, lat REAL, lon REAL) WITHOUT ROWID; Notice also that SQLite has no strong types, it only has affinities, so there is no difference between UNSIGNED INT and INT. They all will be able to be represented as signed 64-bit integers, or indeed anything else you put in such a column (unless it is the rowid alias as described above, which can only hold integer values). More importantly than all the above, the Primary key is essentially an index on the table for id. There is no need to create another index on id. Creating an Index has to put all the data included in the index in a separate sorted B-Tree structure, which in turn will obviously eat up another large amount of space. If you are this concerned with space and the data won't change, why not just save the data in a flatfile? If you like the SQL way of doing things, you can read the file it into a SQLite TEMP table at runtime in Memory without any disk access, and work with it from there. 2MB or 5MB aren't exactly stellar and even the most basic embedded systems would have a memory capacity in excess of that. (And if it doesn't, then SQL isn't really an option to start with). > > Thank you > > Igor > _______________________________________________ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users