On Tue, Apr 3, 2012 at 8:28 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
Might as well jump in with both feet:
autovacuum_naptime = 1s
autovacuum_vacuum_threshold = 1
autovacuum_vacuum_scale_factor = 0.0
If that smooths the latency peaks and doesn't hurt performance too
much, it's
Robert Haas wrote:
Kevin Grittner wrote:
I can't help thinking that the background hinter I had ideas
about writing would prevent many of the reads of old CLOG pages,
taking a lot of pressure off of this area. It just occurred to me
that the difference between that idea and having an
Robert Haas wrote:
I suppose one interesting question is to figure out if there's a way I
can optimize the disk configuration in this machine, or the Linux I/O
scheduler, or something, so as to reduce the amount of time it spends
waiting for the disk.
I'd be curious to know if using the
On Sun, Apr 1, 2012 at 11:12 PM, Greg Stark st...@mit.edu wrote:
On Sun, Apr 1, 2012 at 10:27 PM, Simon Riggs si...@2ndquadrant.com wrote:
So lock starvation on the control lock would cause a long wait after
each I/O, making it look like an I/O problem.
Except that both of the locks involved
On Mon, Apr 2, 2012 at 12:00 AM, Greg Stark st...@mit.edu wrote:
On Sun, Apr 1, 2012 at 4:05 AM, Robert Haas robertmh...@gmail.com wrote:
My guess based on previous testing is
that what's happening here is (1) we examine a tuple on an old page
and decide we must look up its XID, (2) the
On Mon, Apr 2, 2012 at 8:15 AM, Simon Riggs si...@2ndquadrant.com wrote:
Not true, please refer to code at line 544, as I already indicated.
My understanding of the instrumentation is that the lock acquired at
line 526 will show as the blocker until we reach line 555, so anything
in between
On Mon, Apr 2, 2012 at 11:49 AM, Greg Stark st...@mit.edu wrote:
On Mon, Apr 2, 2012 at 8:15 AM, Simon Riggs si...@2ndquadrant.com wrote:
Not true, please refer to code at line 544, as I already indicated.
My understanding of the instrumentation is that the lock acquired at
line 526 will show
On Mon, Apr 2, 2012 at 7:01 AM, Simon Riggs si...@2ndquadrant.com wrote:
Do you consider this proof that it can only be I/O? Or do we still
need to find out?
I stuck a bunch more debugging instrumentation into the SLRU code. It
was fairly clear from the previous round of instrumentation that
Robert Haas robertmh...@gmail.com wrote:
This particular example shows the above chunk of code taking 13s
to execute. Within 3s, every other backend piles up behind that,
leading to the database getting no work at all done for a good ten
seconds.
My guess is that what's happening here is
Robert Haas robertmh...@gmail.com writes:
Long story short, when a CLOG-related stall happens,
essentially all the time is being spent in this here section of code:
/*
* If not part of Flush, need to fsync now. We assume this happens
* infrequently enough that it's not a
On Mon, Apr 2, 2012 at 8:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
Long story short, when a CLOG-related stall happens,
essentially all the time is being spent in this here section of code:
/*
* If not part of Flush, need to fsync now. We
Simon Riggs si...@2ndquadrant.com writes:
I suggest we optimise that by moving the dirty block into shared
buffers and leaving it as dirty. That way we don't need to write or
fsync at all and the bgwriter can pick up the pieces. So my earlier
patch to get the bgwriter to clean the clog would
On Mon, Apr 2, 2012 at 8:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
Agreed, though I think it means the fsync is happening on a filesystem
that causes a full system fsync. That time is not normal.
I don't know what you mean. It looks like there are two cases where
this code path executes.
On Mon, Apr 2, 2012 at 12:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
Long story short, when a CLOG-related stall happens,
essentially all the time is being spent in this here section of code:
/*
* If not part of Flush, need to fsync now. We
On Apr 2, 2012, at 3:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Seems like basically what you've proven is that this code path *is* a
performance issue, and that we need to think a bit harder about how to
avoid doing the fsync while holding locks.
Hmm, good idea. I wonder if we couldn't just
On Apr 2, 2012, at 3:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
Agreed, though I think it means the fsync is happening on a filesystem
that causes a full system fsync. That time is not normal.
It's ext4, which AFAIK does not have that problem.
...Robert
--
Sent via pgsql-hackers
On Mon, Apr 2, 2012 at 12:58 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
I can't help thinking that the background hinter I had ideas about
writing would prevent many of the reads of old CLOG pages, taking a
lot of pressure off of this area. It just occurred to me that the
On Sun, Apr 1, 2012 at 4:05 AM, Robert Haas robertmh...@gmail.com wrote:
If I filter for waits greater than 8s, a somewhat different picture emerges:
2 waited at indexam.c:521 blocked by bufmgr.c:2475
212 waited at slru.c:310 blocked by slru.c:526
In other words, some of the waits
On Sun, Apr 1, 2012 at 7:07 AM, Simon Riggs si...@2ndquadrant.com wrote:
First, we need to determine that it is the clog where this is happening.
I can confirm that based on the LWLockIds. There were 32 of them
beginning at lock id 81, and a gdb session confirms that
On Sun, Apr 1, 2012 at 1:34 PM, Robert Haas robertmh...@gmail.com wrote:
On Sun, Apr 1, 2012 at 7:07 AM, Simon Riggs si...@2ndquadrant.com wrote:
First, we need to determine that it is the clog where this is happening.
I can confirm that based on the LWLockIds. There were 32 of them
On Sun, Apr 1, 2012 at 10:27 PM, Simon Riggs si...@2ndquadrant.com wrote:
So lock starvation on the control lock would cause a long wait after
each I/O, making it look like an I/O problem.
Except that both of the locks involved in his smoking gun occur
*after* the control lock has already been
On Sun, Apr 1, 2012 at 4:05 AM, Robert Haas robertmh...@gmail.com wrote:
My guess based on previous testing is
that what's happening here is (1) we examine a tuple on an old page
and decide we must look up its XID, (2) the relevant CLOG page isn't
in cache so we decide to read it, but (3) the
On Sat, Mar 31, 2012 at 4:41 AM, Robert Haas robertmh...@gmail.com wrote:
which means, if I'm not
confused here, that every single lwlock-related stall 1s happened
while waiting for a buffer content lock. Moreover, each event
affected a different buffer. I find this result so surprising
On Sat, Mar 31, 2012 at 4:53 AM, Simon Riggs si...@2ndquadrant.com wrote:
The next logical step in measuring lock waits is to track the reason
for the lock wait, not just the lock wait itself.
I had the same thought. I'm not immediately sure what the best way to
do that is, but I'll see if I
On Sat, Mar 31, 2012 at 4:41 AM, Robert Haas robertmh...@gmail.com wrote:
But I didn't think we were ever supposed to hold content
locks for that long.
Isn't that lock held while doing visibility checks? What about I/O
waiting for a clog page to be read?
--
greg
--
Sent via pgsql-hackers
On Sat, Mar 31, 2012 at 1:58 PM, Greg Stark st...@mit.edu wrote:
On Sat, Mar 31, 2012 at 4:41 AM, Robert Haas robertmh...@gmail.com wrote:
But I didn't think we were ever supposed to hold content
locks for that long.
Isn't that lock held while doing visibility checks? What about I/O
waiting
On Sat, Mar 31, 2012 at 8:58 AM, Greg Stark st...@mit.edu wrote:
On Sat, Mar 31, 2012 at 4:41 AM, Robert Haas robertmh...@gmail.com wrote:
But I didn't think we were ever supposed to hold content
locks for that long.
Isn't that lock held while doing visibility checks?
Nope. heap_update()
On Sat, Mar 31, 2012 at 10:14 PM, Robert Haas robertmh...@gmail.com wrote:
Isn't that lock held while doing visibility checks?
Nope. heap_update() and friends do a very complicated little dance to
avoid that.
...
What about I/O
waiting for a clog page to be read?
I'm pretty sure that can
On Sat, Mar 31, 2012 at 6:01 PM, Greg Stark st...@mit.edu wrote:
On Sat, Mar 31, 2012 at 10:14 PM, Robert Haas robertmh...@gmail.com wrote:
Isn't that lock held while doing visibility checks?
Nope. heap_update() and friends do a very complicated little dance to
avoid that.
...
What about
On Sat, Mar 31, 2012 at 9:29 PM, Robert Haas robertmh...@gmail.com wrote:
I've discovered a bug in my code that was causing it to print at most
2 histogram buckets per lwlock, which obviously means that my previous
results were totally inaccurate. Ah, the joys of benchmarking. I
found the
30 matches
Mail list logo