-Original Message-
From: Robert Haas [mailto:robertmh...@gmail.com]
Sent: Tuesday, April 09, 2013 9:28 AM
To: Amit Kapila
Cc: Greg Smith; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Page replacement algorithm in buffer cache
On Fri, Apr 5, 2013 at 11:08 PM, Amit Kapila
On Fri, Apr 5, 2013 at 11:08 PM, Amit Kapila amit.kap...@huawei.com wrote:
I still have one more doubt, consider the below scenario for cases when we
Invalidate buffers during moving to freelist v/s just move to freelist
Backend got the buffer from freelist for a request of page-9 (number 9
On Fri, Apr 5, 2013 at 1:12 AM, Amit Kapila amit.kap...@huawei.com wrote:
If we just put it to freelist, then next time if it get allocated directly
from bufhash table, then who will remove it from freelist
or do you think that, in BufferAlloc, if it gets from bufhash table, then it
should
On Saturday, April 06, 2013 12:38 AM Robert Haas wrote:
On Fri, Apr 5, 2013 at 1:12 AM, Amit Kapila amit.kap...@huawei.com
wrote:
If we just put it to freelist, then next time if it get allocated
directly
from bufhash table, then who will remove it from freelist
or do you think that, in
On Thursday, April 04, 2013 7:19 AM Greg Smith wrote:
On 4/2/13 11:54 AM, Robert Haas wrote:
But, having said that, I still think the best idea is what Andres
proposed, which pretty much matches my own thoughts: the bgwriter
needs to populate the free list, so that buffer allocations don't
On Wed, Apr 3, 2013 at 9:49 PM, Greg Smith g...@2ndquadrant.com wrote:
On 4/2/13 11:54 AM, Robert Haas wrote:
But, having said that, I still think the best idea is what Andres
proposed, which pretty much matches my own thoughts: the bgwriter
needs to populate the free list, so that buffer
On Thursday, April 04, 2013 6:12 PM Robert Haas wrote:
On Wed, Apr 3, 2013 at 9:49 PM, Greg Smith g...@2ndquadrant.com
wrote:
On 4/2/13 11:54 AM, Robert Haas wrote:
But, having said that, I still think the best idea is what Andres
proposed, which pretty much matches my own thoughts: the
On Tue, Apr 2, 2013 at 1:20 PM, Andres Freund and...@2ndquadrant.com wrote:
On 2013-04-02 12:56:56 -0400, Tom Lane wrote:
Andres Freund and...@2ndquadrant.com writes:
On 2013-04-02 12:22:03 -0400, Tom Lane wrote:
I agree in general, though I'm not sure the bgwriter process can
reasonably
On Wed, Apr 3, 2013 at 3:00 PM, Robert Haas robertmh...@gmail.com wrote:
The main hesitation I've had about actually implementing such a scheme
is that I find it a bit unappealing to have a background process
dedicated to just this. But maybe it could be combined with some of
the other ideas
Greg Stark st...@mit.edu writes:
I'm still skeptical about the idea of a freelist. That just seems like a
terrible point of contention. However perhaps that's because I'm picturing
an LRU linked list. Perhaps the right thing is to maintain a pool of
buffers in some less contention-prone data
On Thu, Apr 4, 2013 at 3:41 AM, Tom Lane t...@sss.pgh.pa.us wrote:
I think the original vision of the clock sweep algorithm included the
idea that different backends could be running the sweep over different
parts of the buffer ring concurrently. If we could get rid of the need
to serialize
On 4/2/13 11:54 AM, Robert Haas wrote:
But, having said that, I still think the best idea is what Andres
proposed, which pretty much matches my own thoughts: the bgwriter
needs to populate the free list, so that buffer allocations don't have
to wait for linear scans of the buffer array.
I was
On 2013-04-01 17:56:19 -0500, Jim Nasby wrote:
On 3/23/13 7:41 AM, Ants Aasma wrote:
Yes, having bgwriter do the actual cleaning up seems like a good idea.
The whole bgwriter infrastructure will need some serious tuning. There
are many things that could be shifted to background if we knew it
On Tue, Apr 2, 2013 at 6:32 AM, Andres Freund and...@2ndquadrant.com wrote:
On 2013-04-01 17:56:19 -0500, Jim Nasby wrote:
On 3/23/13 7:41 AM, Ants Aasma wrote:
Yes, having bgwriter do the actual cleaning up seems like a good idea.
The whole bgwriter infrastructure will need some serious
On Tue, Apr 2, 2013 at 1:53 AM, Merlin Moncure mmonc...@gmail.com wrote:
That seems pretty unlikely because of A sheer luck of hitting that
page for the dropout (if your buffer count is N the chances of losing
it would seem to be 1/N at most) and B highly used pages are much more
likely to be
On Tue, Apr 2, 2013 at 9:55 AM, Robert Haas robertmh...@gmail.com wrote:
On Tue, Apr 2, 2013 at 1:53 AM, Merlin Moncure mmonc...@gmail.com wrote:
That seems pretty unlikely because of A sheer luck of hitting that
page for the dropout (if your buffer count is N the chances of losing
it would
On Tue, Apr 2, 2013 at 11:32 AM, Merlin Moncure mmonc...@gmail.com wrote:
That's a very fair point, although not being able to evict pinned
buffers is a highly mitigating aspect. Also CLOG is a different beast
entirely -- it's much more dense (2 bits!) vs a tuple so a single page
can a lot of
Robert Haas robertmh...@gmail.com writes:
But, having said that, I still think the best idea is what Andres
proposed, which pretty much matches my own thoughts: the bgwriter
needs to populate the free list, so that buffer allocations don't have
to wait for linear scans of the buffer array.
On 2013-04-02 11:54:32 -0400, Robert Haas wrote:
On Tue, Apr 2, 2013 at 11:32 AM, Merlin Moncure mmonc...@gmail.com wrote:
That's a very fair point, although not being able to evict pinned
buffers is a highly mitigating aspect. Also CLOG is a different beast
entirely -- it's much more
On 2013-04-02 12:22:03 -0400, Tom Lane wrote:
Robert Haas robertmh...@gmail.com writes:
But, having said that, I still think the best idea is what Andres
proposed, which pretty much matches my own thoughts: the bgwriter
needs to populate the free list, so that buffer allocations don't have
Andres Freund and...@2ndquadrant.com writes:
On 2013-04-02 12:22:03 -0400, Tom Lane wrote:
I agree in general, though I'm not sure the bgwriter process can
reasonably handle this need along with what it's already supposed to be
doing. We may need another background process that is just
On 2013-04-02 12:56:56 -0400, Tom Lane wrote:
Andres Freund and...@2ndquadrant.com writes:
On 2013-04-02 12:22:03 -0400, Tom Lane wrote:
I agree in general, though I'm not sure the bgwriter process can
reasonably handle this need along with what it's already supposed to be
doing. We may
I'm confused by this thread. We *used* to maintain an LRU. The whole
reason for the clock-sweep algorithm is precisely to avoid maintaining
a linked list of least recently used buffers since the head of that
list is a point of contention.
--
Sent via pgsql-hackers mailing list
On 2013-04-02 18:26:23 +0100, Greg Stark wrote:
I'm confused by this thread. We *used* to maintain an LRU. The whole
reason for the clock-sweep algorithm is precisely to avoid maintaining
a linked list of least recently used buffers since the head of that
list is a point of contention.
I
On Tue, Apr 2, 2013 at 9:24 PM, Robert Haas robertmh...@gmail.com wrote:
One thought I had for fiddling with usage_count is to make it grow
additively (x = x + 1) and decay exponentially (x = x 1). I'm not
sure the idea is any good, but one problem with the current system is
that it's
On Tue, Apr 2, 2013 at 12:50 PM, Atri Sharma atri.j...@gmail.com wrote:
On Tue, Apr 2, 2013 at 9:24 PM, Robert Haas robertmh...@gmail.com wrote:
One thought I had for fiddling with usage_count is to make it grow
additively (x = x + 1) and decay exponentially (x = x 1). I'm not
sure the idea
Sent from my iPad
On 02-Apr-2013, at 23:41, Merlin Moncure mmonc...@gmail.com wrote:
On Tue, Apr 2, 2013 at 12:50 PM, Atri Sharma atri.j...@gmail.com wrote:
On Tue, Apr 2, 2013 at 9:24 PM, Robert Haas robertmh...@gmail.com wrote:
One thought I had for fiddling with usage_count is to make
On Sun, Mar 31, 2013 at 1:27 PM, Jeff Janes jeff.ja...@gmail.com wrote:
On Friday, March 22, 2013, Ants Aasma wrote:
On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure mmonc...@gmail.com
wrote:
well if you do a non-locking test first you could at least avoid some
cases (and, if you get the
On Mon, Apr 1, 2013 at 9:28 AM, Merlin Moncure mmonc...@gmail.com wrote:
That is one of multiple issues. Contention on the BufFreelistLock is
another one. I agree that usage_count maintenance is unlikely to become a
bottleneck unless one or both of those is fixed first (and maybe not even
On Mon, Apr 1, 2013 at 10:03 AM, Robert Haas robertmh...@gmail.com wrote:
On Mon, Apr 1, 2013 at 9:28 AM, Merlin Moncure mmonc...@gmail.com wrote:
That is one of multiple issues. Contention on the BufFreelistLock is
another one. I agree that usage_count maintenance is unlikely to become a
On Mon, Apr 1, 2013 at 11:55:07AM -0500, Merlin Moncure wrote:
In fact, BufFreelistLock is really misnamed, because for the most
part, the free list as we implement is going to be empty. What the
BufFreelistLock is really doing is serializing the process of scanning
for a free buffer. I
On Mon, Apr 1, 2013 at 3:32 PM, Bruce Momjian br...@momjian.us wrote:
On Mon, Apr 1, 2013 at 11:55:07AM -0500, Merlin Moncure wrote:
In fact, BufFreelistLock is really misnamed, because for the most
part, the free list as we implement is going to be empty. What the
BufFreelistLock is
On 2013-04-01 08:28:13 -0500, Merlin Moncure wrote:
On Sun, Mar 31, 2013 at 1:27 PM, Jeff Janes jeff.ja...@gmail.com wrote:
On Friday, March 22, 2013, Ants Aasma wrote:
On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure mmonc...@gmail.com
wrote:
well if you do a non-locking test first
On Mon, Apr 1, 2013 at 4:09 PM, Andres Freund and...@2ndquadrant.com wrote:
On 2013-04-01 08:28:13 -0500, Merlin Moncure wrote:
On Sun, Mar 31, 2013 at 1:27 PM, Jeff Janes jeff.ja...@gmail.com wrote:
On Friday, March 22, 2013, Ants Aasma wrote:
On Fri, Mar 22, 2013 at 10:22 PM, Merlin
On 3/23/13 7:41 AM, Ants Aasma wrote:
On Sat, Mar 23, 2013 at 6:04 AM, Jim Nasby j...@nasby.net wrote:
Partitioned clock sweep strikes me as a bad idea... you could certainly get
unlucky and end up with a lot of hot stuff in one partition.
Surely that is not worse than having everything in a
On 4/1/13 4:55 PM, Merlin Moncure wrote:
On Mon, Apr 1, 2013 at 4:09 PM, Andres Freundand...@2ndquadrant.com wrote:
On 2013-04-01 08:28:13 -0500, Merlin Moncure wrote:
On Sun, Mar 31, 2013 at 1:27 PM, Jeff Janesjeff.ja...@gmail.com wrote:
On Friday, March 22, 2013, Ants Aasma wrote:
On
On 3/23/13 4:43 AM, Amit Kapila wrote:
I have tried one of the idea's : Adding the buffers background writer finds
reusable to freelist.
http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C382852FF97@szxeml509-mbs
This can reduce the clock swipe as it can find buffers from
I don't be believe there is any reasonable argument that
sitting and spinning while holding the BufFreelistLock is a good idea.
I completely agree. The idea of spinning for a lock while already
inside a lock seems like a source of a hit to performance.
Regards,
Atri
--
Regards,
Atri
On Mon, Apr 1, 2013 at 6:09 PM, Jim Nasby j...@nasby.net wrote:
On 4/1/13 4:55 PM, Merlin Moncure wrote:
On Mon, Apr 1, 2013 at 4:09 PM, Andres Freundand...@2ndquadrant.com
wrote:
On 2013-04-01 08:28:13 -0500, Merlin Moncure wrote:
On Sun, Mar 31, 2013 at 1:27 PM, Jeff
-Original Message-
From: Jim Nasby [mailto:j...@nasby.net]
Sent: Tuesday, April 02, 2013 4:43 AM
To: Amit Kapila
Cc: 'Ants Aasma'; 'Merlin Moncure'; 'Tom Lane'; 'Atri Sharma'; 'Greg
Stark'; 'PostgreSQL-development'
Subject: Re: [HACKERS] Page replacement algorithm in buffer cache
On Friday, March 22, 2013, Ants Aasma wrote:
On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure
mmonc...@gmail.comjavascript:;
wrote:
well if you do a non-locking test first you could at least avoid some
cases (and, if you get the answer wrong, so what?) by jumping to the
next buffer
On Fri, Mar 22, 2013 at 06:06:18PM +, Greg Stark wrote:
On Fri, Mar 22, 2013 at 2:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
And we definitely looked at ARC
We didn't just look at it. At least one release used it. Then patent
issues were raised (and I think the implementation had some
On Fri, Mar 22, 2013 at 04:16:18PM -0400, Tom Lane wrote:
Merlin Moncure mmonc...@gmail.com writes:
I think there is some very low hanging optimization fruit in the clock
sweep loop. first and foremost, I see no good reason why when
scanning pages we have to spin and wait on a buffer in
On Tue, Mar 26, 2013 at 11:40 AM, Bruce Momjian br...@momjian.us wrote:
On Fri, Mar 22, 2013 at 04:16:18PM -0400, Tom Lane wrote:
Merlin Moncure mmonc...@gmail.com writes:
I think there is some very low hanging optimization fruit in the clock
sweep loop. first and foremost, I see no good
Perhaps this isn't the help you were looking for, but I spent a long time
looking into this a few years ago. Then I stopped and decided to work on
other things. I would recommend you do so too.
Agreed. It seems that my concerns were not valid, and since you have
already done some testing
I'll have to take a look. Removing *all spinning* from from page
allocation though feels like it might be worthwhile to test (got to
give some bonus points for being a very local change and simple to
implement). I wonder if with more shared buffers you tend to sweep
more buffers per
On Sun, Mar 24, 2013 at 9:29 AM, Atri Sharma atri.j...@gmail.com wrote:
I'll have to take a look. Removing *all spinning* from from page
allocation though feels like it might be worthwhile to test (got to
give some bonus points for being a very local change and simple to
implement). I wonder
On 3/22/13 8:45 AM, Ants Aasma wrote:
However, I think the main issue isn't finding new algorithms that are
better in some specific circumstances. The hard part is figuring out
whether their performance is better in general.
Right. The current page replacement method works as expected. Many
On Sun, Mar 24, 2013 at 6:11 AM, Greg Smith g...@2ndquadrant.com wrote:
On 3/22/13 8:45 AM, Ants Aasma wrote:
However, I think the main issue isn't finding new algorithms that are
better in some specific circumstances. The hard part is figuring out
whether their performance is better in
If we
use the value calculated locally to decide on eviction, the highly
used buffers where this is likely will get at least one clock sweep
cycle of grace time. If they are indeed highly used it's likely that
someone will manage to bump the usage_count in the meanwhile. If they
are not hot,
On Sun, Mar 24, 2013 at 7:03 PM, Atri Sharma atri.j...@gmail.com wrote:
So, essentially, we decide locally which page to evict, then try to
get a lock on the header only when we want to evict the victim page?
Sounds like the contention for the header should go down considerably.
Not exactly
On 3/22/13 7:27 PM, Ants Aasma wrote:
On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure mmonc...@gmail.com wrote:
well if you do a non-locking test first you could at least avoid some
cases (and, if you get the answer wrong, so what?) by jumping to the
next buffer immediately. if the non
On Saturday, March 23, 2013 9:34 AM Jim Nasby wrote:
On 3/22/13 7:27 PM, Ants Aasma wrote:
On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure mmonc...@gmail.com
wrote:
One other interesting idea I have seen is closeable scalable nonzero
indication (C-SNZI) from scalable rw-locks [1]. The
On Sat, Mar 23, 2013 at 6:29 AM, Atri Sharma atri.j...@gmail.com wrote:
One way to distribute memory contention in case of spinlocks could be
to utilize the fundamentals of NUMA architecture. Specifically, we can
let the contending backends spin on local flags instead on the buffer
header
On Sat, Mar 23, 2013 at 6:04 AM, Jim Nasby j...@nasby.net wrote:
Partitioned clock sweep strikes me as a bad idea... you could certainly get
unlucky and end up with a lot of hot stuff in one partition.
Surely that is not worse than having everything in a single partition.
Given a decent
Partitioned clock sweep strikes me as a bad idea... you could certainly get
unlucky and end up with a lot of hot stuff in one partition.
Another idea that'sbeen broughht up inthe past is to have something in the
background keep a minimum number of buffers on the free list. That's how OS
VM
On Thu, Mar 21, 2013 at 9:51 PM, Atri Sharma atri.j...@gmail.com wrote:
Hello all,
Sorry if this is a naive question.
I was going through Greg Smith's slides on buffer
cache(
http://www.westnet.com/~gsmith/content/postgresql/InsideBufferCache.pdf).
When going through the page replacement
On Fri, Mar 22, 2013 at 7:27 PM, Ants Aasma a...@cybertec.at wrote:
On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure mmonc...@gmail.com wrote:
well if you do a non-locking test first you could at least avoid some
cases (and, if you get the answer wrong, so what?) by jumping to the
next buffer
Jeff Janes jeff.ja...@gmail.com writes:
I'm more convinced in the other direction, new pages should enter with 0
rather than with 1. I think that the argument that a new buffer needs to
be given more of an opportunity to get used again is mostly bogus.
IIRC, the argument for starting at 1 not
On Fri, Mar 22, 2013 at 4:06 AM, Atri Sharma atri.j...@gmail.com wrote:
Not yet, I figured this might be a problem and am designing test cases
for the same. I would be glad for some help there please.
Perhaps this isn't the help you were looking for, but I spent a long time
looking into
Sent from my iPad
On 22-Mar-2013, at 11:28, Amit Kapila amit.kap...@huawei.com wrote:
On Friday, March 22, 2013 10:22 AM Atri Sharma wrote:
Hello all,
Sorry if this is a naive question.
I was going through Greg Smith's slides on buffer
On Friday, March 22, 2013 12:00 PM Atri Sharma wrote:
Sent from my iPad
On 22-Mar-2013, at 11:28, Amit Kapila amit.kap...@huawei.com wrote:
On Friday, March 22, 2013 10:22 AM Atri Sharma wrote:
Hello all,
Sorry if this is a naive question.
I was going through Greg Smith's
I think that if the initialization of USAGE_COUNT starts at the maximum
allowed value instead of one, we can have a better solution to this
problem.
So what is your idea, if you start at maximum, what we will do for further
accesses to it?
I havent chalked out a detailed plan yet, but I
On Friday, March 22, 2013 4:16 PM Atri Sharma wrote:
I think that if the initialization of USAGE_COUNT starts at the
maximum
allowed value instead of one, we can have a better solution to this
problem.
So what is your idea, if you start at maximum, what we will do for
further
What would you do if the only young page has usage count zero during second
sweep.
UmmThe same approach we take when there is no page with usage
count zero in a sweep in the current algorithm?
I don't think introducing another factor along with usage count would do any
much help.
On Friday, March 22, 2013 4:36 PM Atri Sharma wrote:
What would you do if the only young page has usage count zero during
second
sweep.
UmmThe same approach we take when there is no page with usage
count zero in a sweep in the current algorithm?
It would give more priority to young
On Fri, Mar 22, 2013 at 4:53 PM, Amit Kapila amit.kap...@huawei.com wrote:
On Friday, March 22, 2013 4:36 PM Atri Sharma wrote:
What would you do if the only young page has usage count zero during
second
sweep.
UmmThe same approach we take when there is no page with usage
count zero
On Mar 22, 2013 12:46 PM, Atri Sharma atri.j...@gmail.com wrote:
This is the one I think would work out best, add an age factor as to
the time duration which an entry has spent in the cache along with its
usage count.
You might want to check out the LIRS cache replacement algorithm [1].
That
However, I think the main issue isn't finding new algorithms that are
better in some specific circumstances. The hard part is figuring out
whether their performance is better in general. My idea was to create
a patch to capture page pinning traffic from PostgreSQL (maybe stream
out into a
Ants Aasma a...@cybertec.at writes:
You might want to check out the LIRS cache replacement algorithm [1].
That algorithm tries to estimate least frequently used instead of
least recently used. Mysql uses it for their buffer replacement
policy. There is also a clock sweep based approximation
On Fri, Mar 22, 2013 at 2:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
And we definitely looked at ARC
We didn't just look at it. At least one release used it. Then patent
issues were raised (and I think the implementation had some contention
problems).
--
greg
--
Sent via pgsql-hackers
On Fri, Mar 22, 2013 at 11:36 PM, Greg Stark st...@mit.edu wrote:
On Fri, Mar 22, 2013 at 2:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
And we definitely looked at ARC
We didn't just look at it. At least one release used it. Then patent
issues were raised (and I think the implementation had
On Fri, Mar 22, 2013 at 1:13 PM, Atri Sharma atri.j...@gmail.com wrote:
On Fri, Mar 22, 2013 at 11:36 PM, Greg Stark st...@mit.edu wrote:
On Fri, Mar 22, 2013 at 2:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
And we definitely looked at ARC
We didn't just look at it. At least one release used
Merlin Moncure mmonc...@gmail.com writes:
On Fri, Mar 22, 2013 at 1:13 PM, Atri Sharma atri.j...@gmail.com wrote:
What is the general thinking? Is it time to start testing again and
thinking about improvements to the current algorithm?
well, what problem are you trying to solve exactly? the
On Fri, Mar 22, 2013 at 2:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Merlin Moncure mmonc...@gmail.com writes:
On Fri, Mar 22, 2013 at 1:13 PM, Atri Sharma atri.j...@gmail.com wrote:
What is the general thinking? Is it time to start testing again and
thinking about improvements to the current
Merlin Moncure mmonc...@gmail.com writes:
I think there is some very low hanging optimization fruit in the clock
sweep loop. first and foremost, I see no good reason why when
scanning pages we have to spin and wait on a buffer in order to
pedantically adjust usage_count. some simple
On Fri, Mar 22, 2013 at 3:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Merlin Moncure mmonc...@gmail.com writes:
I think there is some very low hanging optimization fruit in the clock
sweep loop. first and foremost, I see no good reason why when
scanning pages we have to spin and wait on a
On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure mmonc...@gmail.com wrote:
well if you do a non-locking test first you could at least avoid some
cases (and, if you get the answer wrong, so what?) by jumping to the
next buffer immediately. if the non locking test comes good, only
then do you
Moreover, if the buffer happens to miss a decrement due to a data
race, there's a good chance that the buffer is heavily used and
wouldn't need to be evicted soon anyway. (if you arrange it to be a
read-test-inc/dec-store operation then you will never go out of
bounds) However, clocksweep
On Friday, March 22, 2013 10:22 AM Atri Sharma wrote:
Hello all,
Sorry if this is a naive question.
I was going through Greg Smith's slides on buffer
cache(http://www.westnet.com/~gsmith/content/postgresql/InsideBufferCac
he.pdf).
When going through the page replacement algorithm that we
80 matches
Mail list logo