Re: [sqlite] Questions on big blobs and blob I/O

2017-03-29 Thread Stephen Chrzanowski
@Jens;

I'd take what David suggested one step further and just remove the BLOB
from the tables of concern and keep the blob on a 1:1 PK:FK relationship
and only look at the BLOB tables when required.  That way if you do an
occasional [select * from ...] you're not reading the BLOB because it's
living somewhere else.

I make this suggestion not just because of SQLite, but, any DBMS I've
developed for.  Keep the really big data somewhere else, with a
relationship to the primary resource required, and then only query that
massive chunk when required through a specialized view or select.

On Wed, Mar 29, 2017 at 2:11 PM, David Raymond <david.raym...@tomtom.com>
wrote:

> Remember to make the blob field the very last field in your table schema,
> and avoid "select * from", otherwise you can create performance problems.
> The data for a record is stored in the same order as the fields are
> defined, and overflow pages are a linked list. So if you have your 100 MB
> blob first, and your ID second, then you will have to read through all 100
> MB of the blob to get to your ID.
>
> For the freeing up of space check out the incremental version of auto
> vacuum. Rather than freeing it all up at once it might be good to run it
> "incrementally" to avoid 1 big performance hit. Maybe have something run
> every so often to free up a set amount of pages, similar to a scheduled WAL
> checkpointer.
>
>
> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Jens Alfke
> Sent: Wednesday, March 29, 2017 1:14 PM
> To: SQLite mailing list
> Subject: [sqlite] Questions on big blobs and blob I/O
>
> I’m thinking of transitioning from external to internal storage of large
> blobs. That is, currently I store these as individual files in a directory
> next to the SQLite database. But it sounds like today’s new improved SQLite
> is better at managing big blobs, and supports streaming reads; and it would
> simplify my code to not have to deal with all those files.
>
> (These blobs are created by a “document attachments” API of the library I
> work on. What’s in them is up to the developer. They’re often fairly small,
> e.g. a thumbnail JPEG, but they could be large media files like movies, in
> the tens or hundreds of megabytes. There’s no upper limit.)
>
> I have two questions:
>
> * Do big blobs cause any performance problems for the db as a whole, like
> fragmentation? In particular, is the blob data stored contiguously in the
> file?
>
> * After a big blob is deleted, my understanding is that the db file won’t
> shrink, so the space isn’t reclaimed to the filesystem. Instead the freed
> pages in the db file will be reused later on. If I urgently need to shrink
> the file, do I have to run a full (expensive) vacuum? (The problem with
> vacuum is that it requires lots of free space to work, but blob deletions
> may be happening on behalf of the user, to free up storage when the
> [mobile] device is running out of free space, creating a catch-22…)
>
> * Is there any way to avoid a sqlite3_blob handle being invalidated when
> the blob’s row is deleted (via a DELETE on another db handle?) I’d prefer
> to have behavior like file deletion in Unix, where open file handles keep
> working even after a file is deleted. I’m wondering whether I can
> accomplish this by opening a read transaction before opening the blob, then
> leaving it open until the blob is closed. Since I’m using WAL, will this db
> handle keep a snapshot of the time before the blob’s row was deleted,
> allowing the blob handle to keep working?
>
> Thanks!
>
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Questions on big blobs and blob I/O

2017-03-29 Thread Richard Hipp
On 3/29/17, David Raymond  wrote:
> Remember to make the blob field the very last field in your table schema,
> and avoid "select * from", otherwise you can create performance problems.
> The data for a record is stored in the same order as the fields are defined,
> and overflow pages are a linked list. So if you have your 100 MB blob first,
> and your ID second, then you will have to read through all 100 MB of the
> blob to get to your ID.

David is correct.  You will do well to follow his advice.

However, for completeness let me add that sometimes SQLite is able to
avoid walking the whole linked list in order to get to data that lives
at the end.

There is a special optimization in the code, that only works when
either auto_vacuum is enabled, that sometimes allows SQLite to skip
over the intermediate pages of a big BLOB and go right to the page at
the end that contains your ID.  Key points are that this optimization
does not work every time and will fall back to walking the list if it
fails, and it never works unless auto_vacuum is turned on.
-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Questions on big blobs and blob I/O

2017-03-29 Thread David Raymond
Remember to make the blob field the very last field in your table schema, and 
avoid "select * from", otherwise you can create performance problems. The data 
for a record is stored in the same order as the fields are defined, and 
overflow pages are a linked list. So if you have your 100 MB blob first, and 
your ID second, then you will have to read through all 100 MB of the blob to 
get to your ID.

For the freeing up of space check out the incremental version of auto vacuum. 
Rather than freeing it all up at once it might be good to run it 
"incrementally" to avoid 1 big performance hit. Maybe have something run every 
so often to free up a set amount of pages, similar to a scheduled WAL 
checkpointer.


-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Jens Alfke
Sent: Wednesday, March 29, 2017 1:14 PM
To: SQLite mailing list
Subject: [sqlite] Questions on big blobs and blob I/O

I’m thinking of transitioning from external to internal storage of large blobs. 
That is, currently I store these as individual files in a directory next to the 
SQLite database. But it sounds like today’s new improved SQLite is better at 
managing big blobs, and supports streaming reads; and it would simplify my code 
to not have to deal with all those files.

(These blobs are created by a “document attachments” API of the library I work 
on. What’s in them is up to the developer. They’re often fairly small, e.g. a 
thumbnail JPEG, but they could be large media files like movies, in the tens or 
hundreds of megabytes. There’s no upper limit.)

I have two questions:

* Do big blobs cause any performance problems for the db as a whole, like 
fragmentation? In particular, is the blob data stored contiguously in the file?

* After a big blob is deleted, my understanding is that the db file won’t 
shrink, so the space isn’t reclaimed to the filesystem. Instead the freed pages 
in the db file will be reused later on. If I urgently need to shrink the file, 
do I have to run a full (expensive) vacuum? (The problem with vacuum is that it 
requires lots of free space to work, but blob deletions may be happening on 
behalf of the user, to free up storage when the [mobile] device is running out 
of free space, creating a catch-22…)

* Is there any way to avoid a sqlite3_blob handle being invalidated when the 
blob’s row is deleted (via a DELETE on another db handle?) I’d prefer to have 
behavior like file deletion in Unix, where open file handles keep working even 
after a file is deleted. I’m wondering whether I can accomplish this by opening 
a read transaction before opening the blob, then leaving it open until the blob 
is closed. Since I’m using WAL, will this db handle keep a snapshot of the time 
before the blob’s row was deleted, allowing the blob handle to keep working?

Thanks!

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Questions on big blobs and blob I/O

2017-03-29 Thread Dan Kennedy

On 03/30/2017 12:52 AM, Olivier Mascia wrote:

Le 29 mars 2017 à 19:40, Simon Slavin  a écrit :


Since I’m using WAL, will this db handle keep a snapshot of the time before the 
blob’s row was deleted, allowing the blob handle to keep working?

As Simon said wisely, a word of the developers would clarify this better.

Yet, since you're using WAL it should work.  Else the implementation of the 
blob API would not obey the transactional/isolation rules and I suppose it 
would qualify as a (horrible) bug.


Transactions are isolated in SQLite, and in WAL mode readers do not 
block writers. So it will work.


Separate database connections for the reader and writer of course.

Dan.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Questions on big blobs and blob I/O

2017-03-29 Thread Olivier Mascia
> Le 29 mars 2017 à 19:40, Simon Slavin  a écrit :
> 
>> Since I’m using WAL, will this db handle keep a snapshot of the time before 
>> the blob’s row was deleted, allowing the blob handle to keep working?

As Simon said wisely, a word of the developers would clarify this better.

Yet, since you're using WAL it should work.  Else the implementation of the 
blob API would not obey the transactional/isolation rules and I suppose it 
would qualify as a (horrible) bug.

-- 
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia, http://integral.software


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Questions on big blobs and blob I/O

2017-03-29 Thread Simon Slavin

On 29 Mar 2017, at 6:14pm, Jens Alfke  wrote:

> * Do big blobs cause any performance problems for the db as a whole, like 
> fragmentation? In particular, is the blob data stored contiguously in the 
> file?

Blobs are stored in the same pages that other values are stored in.  If a blob 
is bigger than a page it needs to be split, with the continuation on another 
page.  So yes, this can lead to internal defragmentation of the database.

However, with modern storage systems fragmentation causes very little decrease 
in access time.  Access to 'sectors' of solid state storage are not related to 
the order in which sectors are stored.  And even on rotating storage so much 
use is made of virtual sectors for wear-balancing purposes that consecutive 
sectors are often not consecutive.  Any advice to users to run defragmentation 
software I have to consider as an obsolete hang-over from obsolete storage 
systems, merely causing additional pointless wear.

> * After a big blob is deleted, my understanding is that the db file won’t 
> shrink, so the space isn’t reclaimed to the filesystem. Instead the freed 
> pages in the db file will be reused later on. If I urgently need to shrink 
> the file, do I have to run a full (expensive) vacuum?

All the above is correct, as far as I know.  Pages and parts of pages are not 
released from filespace, they’re just listed as 'free' and will be used when 
SQLite needs more storage space for that database.  In your situation you 
should take a look at



use FULL mode or perhaps use INCREMENTAL mode and perform the incremental 
vacuums as your app quits.  Note that both of these modes are misnamed.  Read 
the descriptions, don’t depend on the names.

> * Is there any way to avoid a sqlite3_blob handle being invalidated when the 
> blob’s row is deleted (via a DELETE on another db handle?) I’d prefer to have 
> behavior like file deletion in Unix, where open file handles keep working 
> even after a file is deleted. I’m wondering whether I can accomplish this by 
> opening a read transaction before opening the blob, then leaving it open 
> until the blob is closed. Since I’m using WAL, will this db handle keep a 
> snapshot of the time before the blob’s row was deleted, allowing the blob 
> handle to keep working?

I think that the answer here is negative (unless you play C-type pointer games 
with memory blocks) but I have to let the devs answer this one.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Questions on big blobs and blob I/O

2017-03-29 Thread Jens Alfke
I’m thinking of transitioning from external to internal storage of large blobs. 
That is, currently I store these as individual files in a directory next to the 
SQLite database. But it sounds like today’s new improved SQLite is better at 
managing big blobs, and supports streaming reads; and it would simplify my code 
to not have to deal with all those files.

(These blobs are created by a “document attachments” API of the library I work 
on. What’s in them is up to the developer. They’re often fairly small, e.g. a 
thumbnail JPEG, but they could be large media files like movies, in the tens or 
hundreds of megabytes. There’s no upper limit.)

I have two questions:

* Do big blobs cause any performance problems for the db as a whole, like 
fragmentation? In particular, is the blob data stored contiguously in the file?

* After a big blob is deleted, my understanding is that the db file won’t 
shrink, so the space isn’t reclaimed to the filesystem. Instead the freed pages 
in the db file will be reused later on. If I urgently need to shrink the file, 
do I have to run a full (expensive) vacuum? (The problem with vacuum is that it 
requires lots of free space to work, but blob deletions may be happening on 
behalf of the user, to free up storage when the [mobile] device is running out 
of free space, creating a catch-22…)

* Is there any way to avoid a sqlite3_blob handle being invalidated when the 
blob’s row is deleted (via a DELETE on another db handle?) I’d prefer to have 
behavior like file deletion in Unix, where open file handles keep working even 
after a file is deleted. I’m wondering whether I can accomplish this by opening 
a read transaction before opening the blob, then leaving it open until the blob 
is closed. Since I’m using WAL, will this db handle keep a snapshot of the time 
before the blob’s row was deleted, allowing the blob handle to keep working?

Thanks!

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users