>>>>> Roger Binns writes: >>>>> On 30/12/11 20:10, Ivan Shmakov wrote:
>> The problem is that I have a table, each row holding something like >> 64 bytes of data (of BLOB and INTEGER types), which don't seem too >> compressible, but these rows themselves tend to be similar to each >> other, so that the table could still be compressed as a whole, or in >> chunks. > SQLite stores integers in a variable length encoding using (almost) > the minimum bytes necessary. Consequently keeping your numbers as > small as possible will help. The integers could take up to 32 bits long, but I deem them likely to “cluster”, like, e. g.: 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 101, 101, 102, 102, 102, 103, 103, 768, 768, etc. My guess is that such sequences should be quite compressible, but the point is that there'll only be a few such numbers per row, and millions of such rows in the table. > You don't describe what your blobs are, but again encoding them in > the shortest length possible will result in less data and redundancy > making compression less significant. On a second thought, the BLOB's are of little concern: there are not as many of them as of the integers, and they're already in a separate table, like: CREATE TABLE "foo" ( "id" INTEGER PRIMARY KEY, "blob" BLOB NOT NULL); CREATE UNIQUE INDEX "foo-unique" ON "foo" ("blob"); However, this makes me wonder, is there a way to determine the filesystem space occupied by a particular table, or index, in SQLite? It now seems to me that the problem is different to what I've initially presumed. > The page you linked to is a pretty good writeup on the subject. You > can get good compression using CEROD but that doesn't allow making > changes. Other than that you'll need to put something together > yourself. ACK. I'd rarely really need either UPDATE or DELETE on the table in question, so I could probably move the accumulated data into a separate read-only and compressed database file once in a while, though such a design seems rather complex (and fragile) to implement. > You can compress your fields individually but then you'll only be > able to equality comparisons (not ordering or partial matches). > Using a precomputed dictionary can give good results even on short > fields. Here is an example for English(ish) text: > https://github.com/antirez/smaz > Other than that you'll either need to work at the filesystem level or > the SQLite page level as the article describes. Here is another > compressor to consider: > http://blosc.pytables.org/trac > But I suggest you start at the beginning with an understanding of the > SQLite file format: > http://www.sqlite.org/fileformat2.html ACK. Thanks! -- FSF associate member #7257 _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users