>>>>> Roger Binns writes:
>>>>> On 30/12/11 20:10, Ivan Shmakov wrote:
>> The problem is that I have a table, each row holding something like
>> 64 bytes of data (of BLOB and INTEGER types), which don't seem too
>> compressible, but these rows themselves tend to be similar to each
>> other, so that the table could still be compressed as a whole, or in
>> chunks.
> SQLite stores integers in a variable length encoding using (almost)
> the minimum bytes necessary. Consequently keeping your numbers as
> small as possible will help.
The integers could take up to 32 bits long, but I deem them
likely to “cluster”, like, e. g.: 1, 1, 1, 1, 2, 2, 2, 3, 3, 3,
101, 101, 102, 102, 102, 103, 103, 768, 768, etc. My guess is
that such sequences should be quite compressible, but the point
is that there'll only be a few such numbers per row, and
millions of such rows in the table.
> You don't describe what your blobs are, but again encoding them in
> the shortest length possible will result in less data and redundancy
> making compression less significant.
On a second thought, the BLOB's are of little concern: there are
not as many of them as of the integers, and they're already in a
separate table, like:
CREATE TABLE "foo" (
"id" INTEGER PRIMARY KEY,
"blob" BLOB NOT NULL);
CREATE UNIQUE INDEX "foo-unique"
ON "foo" ("blob");
However, this makes me wonder, is there a way to determine the
filesystem space occupied by a particular table, or index, in
SQLite? It now seems to me that the problem is different to
what I've initially presumed.
> The page you linked to is a pretty good writeup on the subject. You
> can get good compression using CEROD but that doesn't allow making
> changes. Other than that you'll need to put something together
> yourself.
ACK. I'd rarely really need either UPDATE or DELETE on the
table in question, so I could probably move the accumulated data
into a separate read-only and compressed database file once in a
while, though such a design seems rather complex (and fragile)
to implement.
> You can compress your fields individually but then you'll only be
> able to equality comparisons (not ordering or partial matches).
> Using a precomputed dictionary can give good results even on short
> fields. Here is an example for English(ish) text:
> https://github.com/antirez/smaz
> Other than that you'll either need to work at the filesystem level or
> the SQLite page level as the article describes. Here is another
> compressor to consider:
> http://blosc.pytables.org/trac
> But I suggest you start at the beginning with an understanding of the
> SQLite file format:
> http://www.sqlite.org/fileformat2.html
ACK. Thanks!
--
FSF associate member #7257
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users