On Wed, Nov 9, 2011 at 3:24 PM, Fabian <fabianpi...@gmail.com> wrote:
> 2011/11/9 Nico Williams <n...@cryptonector.com>
>> I don't get it.  You're reading practically the whole file in a random
>> manner, which is painfully slow, so why can't you read the file in one
>> fell swoop (i.e., sequential reads)??
>
> I'm only reading the whole file when the number of additional inserts is
> high enough to cause the whole index to be read from disk. But if I always
> pre-cache the database, it will downgrade performance for cases when only
> 10 inserts need to be done. And I'd like to avoid to have some fuzzy logic
> that tries to predicts which of the two methods is going to be faster.

I don't see how to avoid that.  Set N=100 inserts before you read the
whole thing into memory.  You'll need to be able to cache, somewhere,
whether the DB has been read since last reboot (you could use a table
in the same DB for this).

> Besides, pre-caching the file sounds easier than it is to accomplish,
> because all methods suggested on this list did not work on Windows (for
> example copying the file to null). Windows and the harddrive have their own
> logic to decide which data to cache, and I haven't found a simple way to
> force a certain file into cache.

On many operating systems copying a file to /dev/null or equivalent
can fail to read the file into cache.

On a Unix the cp(1) utility might mmap() in the file then write(2) the
file a page at a time to /dev/null, with the page fault deferred until
the last minute, but since /dev/null doesn't use the data, the page
fault never comes, thus the file is never read into memory.

If you want to read the file reliably you may need to use a SELECT, or
actually *read* the file into memory, not just mmap() it.

>> Or, if FTS really works better, then use that.
>
> I will, but I'm trying to understand the issue that i'm facing, not just
> workaround it. It seems that FTS doesn't need to read the whole index from
> disk, so I'm trying to pinpoint the difference. My best guess is that it
> creates a fresh b-tree for the additional inserts, causing the boost in
> performance.

Yes, it'd be nice to understand what FTS is doing.  I can imagine many
ways to implement an index that has the performance characteristic
you've observed, but with various trade-offs.

Nico
--
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to