On Thu, Oct 29, 2015 at 8:24 PM, Richard Hipp <drh at sqlite.org> wrote:
> On 10/29/15, Dominique Devienne <ddevienne at gmail.com> wrote: > > > > I've discussed blobs a few times already :). Especially the fact you > can't > > have transactional and incremental updates of blobs (i.e. the transaction > > "copies" only the blob pages actually modified). > > Large blobs are transactional when they are stored in SQLite tables. > Even in the context of https://www.sqlite.org/c3ref/blob_write.html ? Don't "it is not possible to increase the size of a BLOB" and "Writes to the BLOB that occurred before the BLOB handle expired are not rolled back" imply that there is a loss of transactionality? > Perhaps you are talking about storing BLOBs in separate files and then > just storing the filename in the SQLite table? > No, I was talking about "in-row" blobs. Sorry, I was indeed unclear. I wish to write blobs incrementally and transactionally *without* that implying having SQLite rewrite the whole blob (to maintain transactionality). Said differently, I'd like incremental blob updates to be transactional *and* efficient, when the amount written is a fraction of the total blob size. The way Oracle achieves that, AFAIK, is that the row contains a small blob that contains the page indexes for the blobs, and the blob itself is stored "out-of-row" in blob dedicated pages. When doing an incremental blob update, only blob pages actually written are copied for the transaction, as well as the page for the row, with the "blob-index" blob now referencing the new blob pages (as well as the same old unmodified blob pages). > If you do have large BLOBs or strings, SQLite handles this best if the > large blob/string is stored in a table by itself and then linked into > other tables using an integer primary key. For example: > > CREATE TABLE BigBlobTab ( > blobid INTEGER PRIMARY KEY, > content BLOB -- big BLOB or text field. > ); > CREATE TABLE OtherStuff ( > field1 VARCHAR(10), > field2 FLOAT, > field3 BOOLEAN, > blobid INT REFERENCES BigBlobTab > ); > But then there's a lifetime management issue. The blob is an "attribute" of the row, and the parent-child relationship is "upside-down". If the "blob" table references the "row" table, then the row table can exist w/o a blob (when it is a non-null attribute). And conversely, the same blob can be used by several rows, or several rows can share the same blob. And either way, you can't enforce the 1:1 relationship that a row attribute implicitly enjoys. (and is required in some cases). Please there's the fact that foreign-keys (and ON DELETE CASCADE) is optional, so any user of the DB file that can break referential integrity of those files, which it can't if the blob is in-row. > This technique work similarly to storing BLOBs in separate files, in > that it does not mix the BLOB content with the rest of the row, making > the row faster to search and update. But it has the advantage over > separate storage that BLOB changes are transactional and all content > is stored in a single file on disk. > I don't want separate storage at all! I'd want my blobs *in* SQLite in fact. But the above basically makes it too inefficient IMHO. > This is how the Fossil VCS (https://www.fossil-scm.org/) stores content, > fwiw. Fossil stores its "blobs" by SHA-1, and sharing of blobs across revisions/branches is in fact a feature. And all those blobs are immutable by design. At least that the impression I got reading the documentation. SQLite is a wonderful piece of software. And I thank you for it. I'm a big fan of it. But the above is one of my main pet peeve with it :). Which is based on my limited comprehension of how SQLite works. Please do correct me if I'm wrong. Thanks, --DD