On 31 Dec 2011, at 8:37pm, Ivan Shmakov wrote: >>>>>> Simon Slavin <slav...@bigfraud.org> writes: >>>>>> On 31 Dec 2011, at 4:56pm, Ivan Shmakov wrote: > >>> The integers could take up to 32 bits long, but I deem them likely >>> to “cluster”, like, e. g.: 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 101, 101, >>> 102, 102, 102, 103, 103, 768, 768, etc. My guess is that such >>> sequences should be quite compressible, but the point is that >>> there'll only be a few such numbers per row, and millions of such >>> rows in the table. > >> Thing is, an integer like 103 takes two bytes to store, > > Those were given as examples only. The real values are going to > use up to 28 bits (roughly, 4 octets) or so. > >> and one of those bytes is the byte that indicates what kind of value >> is being stored. So it's really only taking one byte per value. > > Is it? I mean, if I know for sure that I'd only ever need, say, > eight 32-bit integers per row, and pack them into a single BLOB > instead of having them stored as proper INTEGER's, won't I be > able to save a few bytes per row? (For a performance penalty > when accessing these fields. Also, I'd have to forget about > indexes on these, other than a UNIQUE one on the whole > BLOB-coded tuple, I guess.) > > (With a few functions to unpack these values back out of BLOB's, > I'd even be able to CREATE a suitable VIEW. Or may be a VIRTUAL > TABLE.)
So yes, it depends on whether you need to do searches or sorting on these columns. If you don't care about the individual fields then all you need is a function which turns either integers into the shortest BLOB it can. >> And you can't improve on this by compressing individual values, only >> by compressing at the table level. And if you're compressing at >> table level, making any changes is rather slow. > > For now, I'm interested mostly in INSERT's. (And SELECT's, of > course.) Well, you obviously realise your trade-offs here. You can compress at the field level with minimal trade-off. You can compress at the row level, which slows down changes and queries, but not much. You can compress at the column or table levels, which slow down changes a great deal. And because of the way SQLite works internally you can compress at the page level. So you're really trying to figure out which trade-off is best for you. It all depends on how big your databases are and whether they live on just machine or need to be sent places frequently. But disk space is so cheap these days it's cheaper for me to go out and buy a Terabyte disk than to spend a week writing clever compression code. >>> [snip]is there a way to determine the filesystem space occupied by a >>> particular table, or index, in SQLite? It now seems to me that the >>> problem is different to what I've initially presumed. > >> Not inside SQLite's API functions. You can do it by reading the file >> format and multiplying the page size by how many pages are used. > > ACK, thanks. Do I understand it correctly that the > sqlite-analyzer (suggested elsewhere in this thread) does this > very thing? Yes. It prints out lots of information about your database including what you asked for. But it's a stand-alone app, not part of the SQLite API. Simon. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users