Dominique Devienne wrote:
> > Looking at the sqlite web site and mailing lists shows that the SQLite
> team
> > has taken a stab at answering the question, "is it faster to read a blob
> > out of sqlite or out of a file?". See the links below.
> >
> > Does the team have analogous guidance regarding write speeds?
> >
>
> Depends if you care only about the initial write, or also about
> incremental writes.
Thanks to those of you who responded already! I'll give a little
more flavor for those who are interested.
My particular application is a cache of large, immutable files. The
UPDATE case doesn't apply to me for this particular application.
I already have a system on top of SQLite's blobs in which I store the
file data in chunks, one chunk per row. (I'm downloading the files from
the network, and at the moment am using a version of the Python SQLite
API that does not have hooks to the incremental blob I/O.) So I
download a few megabytes, start the next hunk download in the
background (using the "requests-futures" Python library, which is
really sexy BTW), write out the current hunk, commit, continue. The
row also has the byte range that it is storing:
CREATE TABLE IF NOT EXISTS file(
fileID TEXT NOT NULL,
start INTEGER NOT NULL, -- offset of first byte of chunk
end INTEGER NOT NULL, -- offset of last byte of chunk
bytes BLOB NOT NULL,
CONSTRAINT BlobSize CHECK(length(bytes) == (end-start+1))
);
All requests are for retrieval of a particular range of bytes for a
particular file. So I have an index:
CREATE INDEX IF NOT EXISTS FileIdxNameStart ON file(
fileID,
start
);
and I create my own little "query plan" in application logic for
servicing a particular request on the read side. The read side is
fast enough for my purposes.
Inserts are slow and deletes are slow. The disk I am using is capable
of writing about 6.5MB/sec, but my app + SQLite only hit around
600KB-1MB/sec under this scheme, depending on some variables. This is
approximately half of the overall speed that I expect (because if I use
'wget' to just write out a downloaded file to disk, I get about
2MB/sec sustained over hundreds of MB). I've played with the chunk
size, the synchronous pragma, the commit interval, the page size pragma
and the cache size pragma.
I have not cleared my application of all blame yet, since I have not
taken enough measurements. I'm considering prototyping a C
implementation that uses incremental blob I/O to write out the file
and is cleverer about the download state machine, but was hoping to
know from this group if that is a fool's errand (e.g. if you all
have some particular reason for knowing that huge blobs in SQLite
are a priori a bad idea for performance reasons).
Deletes by fileID are really awful compared to a simple filesystem
delete -- like 2 orders of magnitude worse by my eyeballing, at least
with the default 1024-byte page size. Again, I think you'd expect it
to be -- an unlink(2) only has to touch a few pages of disk, whereas
an SQLite delete of 1GB worth of blob rows I guess goes and touches
something every page_size bytes (right?).
So I'm thinking of just storing the data out of the database, perhaps
retaining an SQLite index. But I'm worried about the safety issues that
arise from that. E.g. I finish writing my file, I write an index
entry and commit it, and immediately afterward I get a power failure.
Am I sure that the file bytes are correct? And so on. All of those
considerations are right in SQLite's wheelhouse -- I'd hate not to be
able to take advantage of it.
--
Eric A. Rubin-Smith
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users