Re: [sqlite] index has higher cache priority than data?

2012-09-21 Thread Ryan Johnson

On 21/09/2012 7:54 AM, Clemens Ladisch wrote:

John Bachir wrote:

i've read other posts on this list that say that we can't guess what sqlite
will do with cache.

It uses a simple LRU algorithm to determine which pages to kick out of
the page cache first (so at least it's somewhat deterministic).


however, could i be relatively confident that most of the time, it
will prioritize keeping the index in memory before it starts keeping
the data?

The page cache does not know what is in the pages.

Let's look at a simple example: assume the index has two pages, X and Y,
which each point to records in three data pages:
   X -> A,B,C; Y -> D,E,F

The order in which the pages would be used is this:
   X A X B X C Y D Y E Y F

For LRU, the last usage matters, so the LRU list will look like this:
   A B X C D E Y F

So the data pages _will_ crowd out the index pages (especially when
there are much fewer index then data pages ).


ideally it would always keep the entire index in memory and never
cache the data.

if i can't more or less depend on this, then sqlite probably won't
work for my application.

You could write your own page cache implementation that wraps the
original one but never throws out certain pages ...

This might help: http://www.sqlite.org/capi3ref.html#sqlite3_pcache
By implementing a custom page cache using this API, an application can 
better control the amount of memory consumed by SQLite, the way in 
which that memory is allocated and released, and the policies used to 
determine exactly which parts of a database file are cached and for 
how long.


AFAICT, a pluggable cache only needs to worry about memory, with all I/O 
handled by sqlite3. It shouldn't be too hard to cook up a minimal 
version. I'm a bit doubtful on the "exactly which parts of a file" 
claim, since the API doesn't tell you anything about the pages it asks 
you to cache. However, the "clock" algorithm would probably do you want, 
without needing to know which pages actually belong to an index: it 
prefers to evict pages that have been touched the fewest times (with 
decay), rather than those that have gone the longest since their last 
touch.


Advantages:
- Popular pages are hard to evict, but become unpopular if left 
untouched too long
- Simpler code (a for loop and a simple counter at each cache slot, vs. 
some sorted data structure for LRU)
- Lower runtime overhead (amortized constant cost per unpin vs. 
logarithmic cost for LRU)


To implement Clock: arrange the cache slots in a circle, and keep a 
remembered position, the "hand." Whenever the system needs to allocate a 
new page, the "hand" sweeps around the circle looking for an unpinned 
page with zero touch count, and evicts the first such page; the touch 
count increments whenever the page is unpinned, and decrements whenever 
the "hand" passes it. Any page unpinned at least once per clock cycle 
will remain in cache, with memory pressure making clock cycles shorter.


Note that, while any given eviction can require looking at multiple 
pages, it usually averages out to only a few pages per eviction. The 
worst case would be if an adversary touched N-1 popular pages once for 
each unpopular page it fetches: the unpopular page would be evicted 
every time, but clock would have to sweep the whole pool to discover 
this. Even then, though, you pay cost N to evict a page once every N 
unpins.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] index has higher cache priority than data?

2012-09-21 Thread Simon Slavin

On 21 Sep 2012, at 12:51pm, _ph_  wrote:

> I've seen that the timre required to create that snapshot depends largely on
> the size of the entire table, even if only selected columns go into the
> snapshot. 
> 
> I.e. 
> 
>   ATTACH ':memory:' AS mem;
>  INSERT INTO mem.Snapshot SELECT col1, col2 FROM Data;
> 
> is much slower if 'Data' contains an additional column with large data.
> Moving my item meta data (small) into a table separate from the possibly
> large blobs helped immensely. 

As an alternative to moving the BLOBs out, it may speed up a lot of you make an 
index which contains just columns col1 and col2.  Do this just one

CREATE INDEX fastCol1Col2 ON Data (col1,col2);
ANALYZE

Then do

INSERT INTO mem.Snapshot SELECT col1, col2 FROM Data ORDER BY col1,col2;

SQLite will realise it can get all the data it needs from the index it's 
already using.  It will never read the TABLE data at all.  However the actual 
speed advantage you get it strongly tied to your hardware setup, so do some 
testing first to see if it helps as much as I think it will.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] index has higher cache priority than data?

2012-09-21 Thread _ph_
There is also the compile-time option

SQLITE_DIRECT_OVERFLOW_READ

(see here: http://www.sqlite.org/compile.html)
that makes content in overflow pages bypass the SQlite page cache. In my
understanding, that should help if the majority data consists of large BLOB
/ String cells. 

(It would be nice if this option could be set through a pragma, for
performance comparisons)





--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/index-has-higher-cache-priority-than-data-tp64393p64399.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] index has higher cache priority than data?

2012-09-21 Thread Clemens Ladisch
John Bachir wrote:
> i've read other posts on this list that say that we can't guess what sqlite
> will do with cache.

It uses a simple LRU algorithm to determine which pages to kick out of
the page cache first (so at least it's somewhat deterministic).

> however, could i be relatively confident that most of the time, it
> will prioritize keeping the index in memory before it starts keeping
> the data?

The page cache does not know what is in the pages.

Let's look at a simple example: assume the index has two pages, X and Y,
which each point to records in three data pages:
  X -> A,B,C; Y -> D,E,F

The order in which the pages would be used is this:
  X A X B X C Y D Y E Y F

For LRU, the last usage matters, so the LRU list will look like this:
  A B X C D E Y F

So the data pages _will_ crowd out the index pages (especially when
there are much fewer index then data pages ).

> ideally it would always keep the entire index in memory and never
> cache the data.
>
> if i can't more or less depend on this, then sqlite probably won't
> work for my application.

You could write your own page cache implementation that wraps the
original one but never throws out certain pages ...


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] index has higher cache priority than data?

2012-09-21 Thread _ph_
I have a little experience with a somewhat similar setup: a typical "real"
file is 200MB...2GB. I am making a "snapshot" of the data structure (few
hundred K) into an attached in-memory-database. 

I've seen that the timre required to create that snapshot depends largely on
the size of the entire table, even if only selected columns go into the
snapshot. 

I.e. 

   ATTACH ':memory:' AS mem;
  INSERT INTO mem.Snapshot SELECT col1, col2 FROM Data;

is much slower if 'Data' contains an additional column with large data.
Moving my item meta data (small) into a table separate from the possibly
large blobs helped immensely. 

Note: I haven't investigated that much, as separating the large data column
into a separate table makes sense for other reasons. It could be that the
significant difference - even though very consistent to observe with
multiple files - was more due to OS/disk caching than SQLite itself. 




--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/index-has-higher-cache-priority-than-data-tp64393p64397.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] index has higher cache priority than data?

2012-09-21 Thread John Bachir
I have an application with the following behavior:

   - Data is large -- gigabytes or 10s of gigabytes (will never fit into
   memory)
   - the indexes for the data is small -- will definitely fit into memory
   - it will be relatively rare that any particular data is accessed more
   often than others

for these reasons, it would never be advantageous for sqlite to ever cache
the data into memory. ideally it would always keep the entire index in
memory and never cache the data.

i've read other posts on this list that say that we can't guess what sqlite
will do with cache. however, could i be relatively confident that most of
the time, it will prioritize keeping the index in memory before it starts
keeping the data? if i can't more or less depend on this, then sqlite
probably won't work for my application.

Thanks,
John
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users