Re: [sqlite] transparent compression implementations for SQLite?

Ivan Shmakov Sat, 31 Dec 2011 08:56:48 -0800

>>>>> Roger Binns writes:
>>>>> On 30/12/11 20:10, Ivan Shmakov wrote:


 >> The problem is that I have a table, each row holding something like
 >> 64 bytes of data (of BLOB and INTEGER types), which don't seem too
 >> compressible, but these rows themselves tend to be similar to each
 >> other, so that the table could still be compressed as a whole, or in
 >> chunks.

 > SQLite stores integers in a variable length encoding using (almost)
 > the minimum bytes necessary.  Consequently keeping your numbers as
 > small as possible will help.

        The integers could take up to 32 bits long, but I deem them
        likely to “cluster”, like, e. g.: 1, 1, 1, 1, 2, 2, 2, 3, 3, 3,
        101, 101, 102, 102, 102, 103, 103, 768, 768, etc.  My guess is
        that such sequences should be quite compressible, but the point
        is that there'll only be a few such numbers per row, and
        millions of such rows in the table.

 > You don't describe what your blobs are, but again encoding them in
 > the shortest length possible will result in less data and redundancy
 > making compression less significant.

        On a second thought, the BLOB's are of little concern: there are
        not as many of them as of the integers, and they're already in a
        separate table, like:

CREATE TABLE "foo" (
    "id"        INTEGER PRIMARY KEY,
    "blob"      BLOB    NOT NULL);

CREATE UNIQUE INDEX "foo-unique"
    ON "foo" ("blob");

        However, this makes me wonder, is there a way to determine the
        filesystem space occupied by a particular table, or index, in
        SQLite?  It now seems to me that the problem is different to
        what I've initially presumed.

 > The page you linked to is a pretty good writeup on the subject.  You
 > can get good compression using CEROD but that doesn't allow making
 > changes.  Other than that you'll need to put something together
 > yourself.

        ACK.  I'd rarely really need either UPDATE or DELETE on the
        table in question, so I could probably move the accumulated data
        into a separate read-only and compressed database file once in a
        while, though such a design seems rather complex (and fragile)
        to implement.

 > You can compress your fields individually but then you'll only be
 > able to equality comparisons (not ordering or partial matches).
 > Using a precomputed dictionary can give good results even on short
 > fields.  Here is an example for English(ish) text:

 > https://github.com/antirez/smaz

 > Other than that you'll either need to work at the filesystem level or
 > the SQLite page level as the article describes.  Here is another
 > compressor to consider:

 > http://blosc.pytables.org/trac

 > But I suggest you start at the beginning with an understanding of the
 > SQLite file format:

 > http://www.sqlite.org/fileformat2.html

        ACK.  Thanks!

-- 
FSF associate member #7257

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] transparent compression implementations for SQLite?

Reply via email to