Am 27.11.2008 um 09:12 schrieb Simon Bulman:

> I have been playing around with SQLite to use as an alternative to  
> one of
> our proprietary file formats used to read large amounts of data. Our
> proprietary format performs very badly i.e. takes a long time to  
> load some
> data; as expected SQLite is lighting quick in comparison - great!
>
> One considerable stumbling block is the footprint (size) of the  
> database
> file on disk. It turns out that SQLite is roughly 7x larger than our
> proprietary format - this is prohibitive. The data is pretty simple  
> really,
> 2 tables
>
> Table 1
>
> BIGINT (index),  VARCHAR(30), VARCHAR(10)
>
>
> Table 2
>
> BIGINT (index), FLOAT
>
>
> For a particular data set Table1 has 1165 rows and Table 2 has 323  
> rows,
> however typically Table 2 becomes bigger for larger models. The size  
> on disk
> of this file is 11.8 Mb (compared to 1.7 Mb for our proprietary  
> format). I
> have noticed that if I drop the indexes the size drops dramatically -
> however the query performance suffers to an unacceptable level.
>
> For a larger model the DB footprint is 2.2 Gb compared to 267 Mb for  
> the
> proprietary format.
>
> Does anybody have any comments on this? Are there any configuration  
> options
> or ideas I could use to reduce the footprint of the db file?


I don't think you'll be able to make SQLite as efficient (regarding  
storage size) as a custom file format, because it has to have some  
overhead for indexes, etc.

However, one thing that comes to mind is the way string data is stored:
If you're concerned about disk space an your string data is mostly  
ASCII, make sure your strings are stored as UTF-8 - for ASCII string  
data, this will save you one byte per character in the string data  
storage.
To enforce UTF-8 string storage, execute "PRAGMA encoding='UTF-8'" as  
the first command when creating the database (before you create and  
tables).
You can query the format using "PRAGMA encoding" - UTF-16 encodings  
will store two bytes / character, regardless of the actual characters...

Note that this doesn't mean your database size will shrink to half the  
size - it merely means you'll be able to fit more rows onto a single  
page, thus eventually you should see a decrease in file size when  
comparing UTF-16 vs. UTF-8 databases.

BTW: are you aware that SQLite database won't shrink by themselves?  
You'll have to vacuum them to reclaim unused space (see 
<http://www.sqlite.org/faq.html#q12 
 >)

HTH,
</jum>

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to