Dennis Cote <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> > Dave Gierok <[EMAIL PROTECTED]> wrote:
> >   
> >> It looks like the size of a Sqlite DB ends up being much larger 
> >> (more than 2x) than size that I calculate for its data set.
> >>
> >> A simple test shows that when creating one table with one integer 
> >> column and filling it with 10000 rows, I get a DB size of 92KB 
> >> instead of what I'd expect to be around 40KB plus some small
> >> overhead for the table definition.  This seems to scale linearly
> >> as I increase the amount of data in the DB.
> >>     
> >
> > SQLite stores 64-bit integers, not 32-bit as you suppose.  And
> > each row also stores a 64-bit integer rowid in addition to the
> > data.  So that it fits in 92KB instead of the (naively expected)
> > 160KB suggests that SQLite is actually doing a reasonable job of
> > compressing the data.
> >   
> I hate to disagree with the author, but that description is not quite 
> accurate. :-)
> 
> SQLite uses variable length integer storage...

No.  I'm going to stand by what I said.  SQLite works with 64-bit
integer values.  When writing those values to the disk, various
compression techniques are used to avoid having to take up 8 bytes
of disk space in the common case where most of those bytes are
going to be zero.  

Various encodings are used.  All of them are Huffman codes over
a fixed probability distribution.

Dennis calls these "variable length integers".  I call them
integers that are compressed using a Huffman code.  That's the
same thing in practice.  But the nomenclature is important because
I can point to Huffman's PhD thesis in 1952 to prove that the
on-disk representation of integers in SQLite is not patentable.
--
D. Richard Hipp  <[EMAIL PROTECTED]>


-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to