On Fri, May 16, 2014 at 1:22 PM, Peter Geoghegan <p...@heroku.com> wrote:
> I have performed a new benchmark related to my ongoing experimentation
> around caching and buffer manager scalability. The benchmark tests a
> minor refinement of the prototype patch previously posted . The
> patch itself is still very much a prototype, and does not
> significantly differ from what I originally posted. The big
> difference is usage_count starts at 6, and saturates at 30, plus I've
> tried to reduce the impact of the prior prototype's gettimeofday()
> calls by using clock_gettime() + CLOCK_MONOTONIC_COARSE. I previously
> posted some numbers for a patch with just the former change.
> I effectively disabled the background writer entirely here, since it
> never helps. These are unlogged tables, so as to not have the outcome
> obscured by checkpoint spikes during the sync phase that are more or
> less inevitable here (I believe this is particularly true given the
> hardware I'm using). Multiple client counts are tested, giving some
> indication of the impact on scalability. The same gains previously
> demonstrated in both transaction throughput and latency are once again
> clearly in evidence.
> I should emphasize that although I've talked a lot about LRU-K and
> other more sophisticated algorithms, this proof of concept still only
> adds a correlated reference period (while allowing usage_count to span
> a larger range). I have yet to come up with something really
> interesting, such as a patch that makes an inference about the
> frequency of access of a page based on the recency of its penultimate
> access (that is, a scheme that is similar to LRU-K, a scheme known to
> be used in other systems  and thought to be widely influential).
One point which I observed while reading the paper mentioned by
you '' is that, they are telling to have replacement policy based on
page type (index-root page, heap page, etc) which seems to be
relevant considering that proportion of read statements using index
scan is quite higher.
Another thing that caught my eye while reading LRU-K paper is to
give consideration for correlated reference pairs  in usage_count
maintenance (increment/decrement) , basically access
patterns for Intra-Transaction and Intra-Process (there is one
another mentioned in paper as Transaction-Retry which I am not
if it is much relevant).
: Definitions copied from LRU-K page, just for ease of reference
(a) Intra-Transaction. A transaction accesses a page,
then accesses the same page again before cmnrnitting. This
is likely to happen with certain update transactions, first
reading a row and later updating a value in the row.
(b) Transaction-Retry. A transaction accesses a page,
then aborts and is retried, and the retried transaction accesses
the page again for the same purpose.
(c) Intra-Process. A transaction references a page, then
commits, and the next transaction by the same process accesses
the page again. This pattern of access commonly
arises in batch update applications, which update 10 records
in sequence, commit, then start again by referencing the
next record on the same page.