Re: [sqlite] transparent compression implementations for SQLite?

Simon Slavin Sat, 31 Dec 2011 13:03:33 -0800

On 31 Dec 2011, at 8:37pm, Ivan Shmakov wrote:

>>>>>> Simon Slavin <slav...@bigfraud.org> writes:
>>>>>> On 31 Dec 2011, at 4:56pm, Ivan Shmakov wrote:
> 
>>> The integers could take up to 32 bits long, but I deem them likely
>>> to “cluster”, like, e. g.: 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 101, 101,
>>> 102, 102, 102, 103, 103, 768, 768, etc.  My guess is that such
>>> sequences should be quite compressible, but the point is that
>>> there'll only be a few such numbers per row, and millions of such
>>> rows in the table.
> 
>> Thing is, an integer like 103 takes two bytes to store,
> 
>       Those were given as examples only.  The real values are going to
>       use up to 28 bits (roughly, 4 octets) or so.
> 
>> and one of those bytes is the byte that indicates what kind of value
>> is being stored.  So it's really only taking one byte per value.
> 
>       Is it?  I mean, if I know for sure that I'd only ever need, say,
>       eight 32-bit integers per row, and pack them into a single BLOB
>       instead of having them stored as proper INTEGER's, won't I be
>       able to save a few bytes per row?  (For a performance penalty
>       when accessing these fields.  Also, I'd have to forget about
>       indexes on these, other than a UNIQUE one on the whole
>       BLOB-coded tuple, I guess.)
> 
>       (With a few functions to unpack these values back out of BLOB's,
>       I'd even be able to CREATE a suitable VIEW.  Or may be a VIRTUAL
>       TABLE.)


So yes, it depends on whether you need to do searches or sorting on these 
columns.  If you don't care about the individual fields then all you need is a 
function which turns either integers into the shortest BLOB it can.

>> And you can't improve on this by compressing individual values, only
>> by compressing at the table level.  And if you're compressing at
>> table level, making any changes is rather slow.
> 
>       For now, I'm interested mostly in INSERT's.  (And SELECT's, of
>       course.)

Well, you obviously realise your trade-offs here.  You can compress at the 
field level with minimal trade-off.  You can compress at the row level, which 
slows down changes and queries, but not much.  You can compress at the column 
or table levels, which slow down changes a great deal.  And because of the way 
SQLite works internally you can compress at the page level.

So you're really trying to figure out which trade-off is best for you.  It all 
depends on how big your databases are and whether they live on just machine or 
need to be sent places frequently.  But disk space is so cheap these days it's 
cheaper for me to go out and buy a Terabyte disk than to spend a week writing 
clever compression code.

>>> [snip]is there a way to determine the filesystem space occupied by a
>>> particular table, or index, in SQLite?  It now seems to me that the
>>> problem is different to what I've initially presumed.
> 
>> Not inside SQLite's API functions.  You can do it by reading the file
>> format and multiplying the page size by how many pages are used.
> 
>       ACK, thanks.  Do I understand it correctly that the
>       sqlite-analyzer (suggested elsewhere in this thread) does this
>       very thing?

Yes.  It prints out lots of information about your database including what you 
asked for.  But it's a stand-alone app, not part of the SQLite API.

Simon.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] transparent compression implementations for SQLite?

Reply via email to