Am 02.05.2006 um 18:46 schrieb [EMAIL PROTECTED]:

Felix Schwarz <[EMAIL PROTECTED]> wrote:

I'm wondering whether there is a big
performance hit for a simple

        SELECT binarydata FROM entries WHERE somehash = 27817298;

when I use

        CREATE TABLE entries(
                entry_id INTEGER PRIMARY KEY,
                somehash  INTEGER,
                property1 INTEGER,
                property2  VARCHAR(255),
                property3  VARCHAR(255),
                binarydata BLOB
        );

instead of splitting the binary data (around 40K each) into two
tables like this:

        CREATE TABLE entries(
                entry_id INTEGER PRIMARY KEY,
                somehash  INTEGER,
                property1 INTEGER,
                property2  VARCHAR(255),
                property3  VARCHAR(255),
                binary_id INTEGER
        );

        CREATE TABLE binaries(
                binary_id INTEGER PRIMARY KEY,
                binarydata BLOB
        );


If you do things like this:

   UPDATE entries SET property1=property1+1;

In other words, if you make changes that do not
alter the blob, then the second form can be dramatically
faster since the blob never has to be copied.  The whole blob
has to be copied several times to do this update in the first
form.  In order to do an update, SQLite has to load the entire
row into memory, decode it, substitute the changed value(s),
reencode the row, then write the entire row back out to disk.
So the entire row has to be read and written in order to change
a single byte.  (This is because SQLite uses variable-length
encodings for everything, including integers, so changing a
single byte can change all subsequent bytes in the row.)  So
if you are going to be updating things, it is best to keep
the rows relatively small by moving large blobs out to a
separate table.

If you do things like this:

   SELECT sum(property1) FROM entries;

Then the second form is slightly faster because the entries table
will be smaller and you will get better locality of reference.
SQLite does not read the blob if it is not used here, but it does
have to skip over it.
--
D. Richard Hipp   <[EMAIL PROTECTED]>

Thank you (and everbody else) very much for the answers. They help a lot. However, for the second part of your answer about the second form being faster because the entries table will be smaller and me getting better locality of reference:

If I don't build the two tables in the

        INSERT INTO entries ..
        INSERT INTO entries ..
        INSERT INTO entries ..

        INSERT INTO binaries ..
        INSERT INTO binaries ..
        INSERT INTO binaries ..

but instead do a

        INSERT INTO entries ..
        INSERT INTO binaries ..

        INSERT INTO entries ..
        INSERT INTO binaries ..

        INSERT INTO entries ..
        INSERT INTO binaries ..

over a longer period of time .. does the part of your answer about better locality still apply? Or does this way of creating the table data lead to a loss of this benefit?

Thanks in advance,

Felix

Reply via email to