On Nov 28, 2008, at 7:37 AM, Simon Bulman wrote:

> Ahhh, sorry, I wrongly calculated the number of rows in table 2. It  
> actually
> has 29581 rows. Still surprised at the 7x size increase but perhaps  
> you are
> not based on the overheads?

The overhead should typically be about 20% or less - not 7x.   
Something else is going wrong.  Run sqlite3_analyzer.exe and post the  
results somewhere that we can read them.  I suspect the output of  
sqlite3_analyzer.exe will give us the clues we need to figure out what  
is amiss.


>
>
> Sorry for the misleading info,
>
> Cheers,
> S.
>
> -----Original Message-----
> From: Jens Miltner [mailto:[EMAIL PROTECTED]
> Sent: 28 November 2008 08:38
> To: [EMAIL PROTECTED]
> Cc: General Discussion of SQLite Database
> Subject: Re: [sqlite] Database file size
>
>
> Am 28.11.2008 um 09:20 schrieb Simon Bulman:
>
>> Hi Jens,
>>
>> Thanks for your input. UTF-8 did not make a difference. I expected
>> that
>> SQLite file would be larger on disk than our proprietary format
>> because of
>> the overheads that you mention - I am surprised however it at least  
>> 7x
>> larger.
>
> To be honest - given your table definitions below, I'm surprised the
> database is _that_ large, too:
>
> Table 1 - according to your definition - should contain at most about
> 50 bytes of pure data per row (plus the overhead needed by SQLite).
> Table 2 would only contain ~ 16 bytes of data per row.
>
> Dividing the database disk size by the total number of rows you
> mentioned, would indicate a whopping 8k per row.
>
> I did a quick test and created a schema similar to what you outlined
> and filled it with data (the same number of rows you mentioned and 28
> and ~20 characters per row for the two varchar columns) and my
> database ended up being 71kB in size instead of the 11.8 MB you saw...
>
> Are there any other tables that contain non-negligible amounts of  
> data?
> Are the data sizes indeed what's indicated in the schema (since SQLite
> doesn't really care about the varchar size constraints you can
> actually put any amount of data into a varchar(30) column) ?
>
> </jum>
>
>
>>
>> I am actually recreating the whole database (delete file and  
>> recreate)
>> programmatically so vacuuming has not effect.
>>
>> Cheers,
>> S.
>>
>> -----Original Message-----
>> From: Jens Miltner [mailto:[EMAIL PROTECTED]
>> Sent: 27 November 2008 13:48
>> To: General Discussion of SQLite Database
>> Cc: [EMAIL PROTECTED]
>> Subject: Re: [sqlite] Database file size
>>
>>
>> Am 27.11.2008 um 09:12 schrieb Simon Bulman:
>>
>>> I have been playing around with SQLite to use as an alternative to
>>> one of
>>> our proprietary file formats used to read large amounts of data. Our
>>> proprietary format performs very badly i.e. takes a long time to
>>> load some
>>> data; as expected SQLite is lighting quick in comparison - great!
>>>
>>> One considerable stumbling block is the footprint (size) of the
>>> database
>>> file on disk. It turns out that SQLite is roughly 7x larger than our
>>> proprietary format - this is prohibitive. The data is pretty simple
>>> really,
>>> 2 tables
>>>
>>> Table 1
>>>
>>> BIGINT (index),  VARCHAR(30), VARCHAR(10)
>>>
>>>
>>> Table 2
>>>
>>> BIGINT (index), FLOAT
>>>
>>>
>>> For a particular data set Table1 has 1165 rows and Table 2 has 323
>>> rows,
>>> however typically Table 2 becomes bigger for larger models. The size
>>> on disk
>>> of this file is 11.8 Mb (compared to 1.7 Mb for our proprietary
>>> format). I
>>> have noticed that if I drop the indexes the size drops  
>>> dramatically -
>>> however the query performance suffers to an unacceptable level.
>>>
>>> For a larger model the DB footprint is 2.2 Gb compared to 267 Mb for
>>> the
>>> proprietary format.
>>>
>>> Does anybody have any comments on this? Are there any configuration
>>> options
>>> or ideas I could use to reduce the footprint of the db file?
>>
>>
>> I don't think you'll be able to make SQLite as efficient (regarding
>> storage size) as a custom file format, because it has to have some
>> overhead for indexes, etc.
>>
>> However, one thing that comes to mind is the way string data is
>> stored:
>> If you're concerned about disk space an your string data is mostly
>> ASCII, make sure your strings are stored as UTF-8 - for ASCII string
>> data, this will save you one byte per character in the string data
>> storage.
>> To enforce UTF-8 string storage, execute "PRAGMA encoding='UTF-8'" as
>> the first command when creating the database (before you create and
>> tables).
>> You can query the format using "PRAGMA encoding" - UTF-16 encodings
>> will store two bytes / character, regardless of the actual
>> characters...
>>
>> Note that this doesn't mean your database size will shrink to half  
>> the
>> size - it merely means you'll be able to fit more rows onto a single
>> page, thus eventually you should see a decrease in file size when
>> comparing UTF-16 vs. UTF-8 databases.
>>
>> BTW: are you aware that SQLite database won't shrink by themselves?
>> You'll have to vacuum them to reclaim unused space (see
>> <http://www.sqlite.org/faq.html#q12
>>> )
>>
>> HTH,
>> </jum>
>>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

D. Richard Hipp
[EMAIL PROTECTED]



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to