Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-24 Thread Bruce Momjian
Applied. --- Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian br...@momjian.us writes: OK, I have attached a proposed patch to improve this. I moved the pg_clog mention to a new paragraph and linked it to the

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Alvaro Herrera
Excerpts from Bruce Momjian's message of dom ago 22 12:51:47 -0400 2010: Well, the reason that value is 200 million is for pg_clog cleanup, not for xid wraparound protection. The next sentence does relate to xid wraparound, but it seems to fit because the previous sentence ends with xid

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Bruce Momjian's message of dom ago 22 12:51:47 -0400 2010: Do you have a suggestion? Reorder the items? I'd add another para before that one saying that this value also affects pg_clog truncation. I agree that putting pg_clog

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Alvaro Herrera
Excerpts from Tom Lane's message of lun ago 23 12:40:32 -0400 2010: Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Bruce Momjian's message of dom ago 22 12:51:47 -0400 2010: Do you have a suggestion? Reorder the items? I'd add another para before that one saying that

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Josh Berkus
All, FYI, the system which sparked this original discussion turned out, on extensive analysis, to have ZFS-level filesystem corruption. The polling issues were related to that rather than to Postgres. -- -- Josh Berkus

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Bruce Momjian
Alvaro Herrera wrote: Excerpts from Tom Lane's message of lun ago 23 12:40:32 -0400 2010: Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Bruce Momjian's message of dom ago 22 12:51:47 -0400 2010: Do you have a suggestion? Reorder the items? I'd add another para

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Tom Lane
Bruce Momjian br...@momjian.us writes: OK, I have attached a proposed patch to improve this. I moved the pg_clog mention to a new paragraph and linked it to the reason the default is relatively low. The references to vacuum freeze are incorrect; autovacuum does NOT do the equivalent of VACUUM

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Alvaro Herrera
Excerpts from Bruce Momjian's message of lun ago 23 14:55:55 -0400 2010: OK, I have attached a proposed patch to improve this. I moved the pg_clog mention to a new paragraph and linked it to the reason the default is relatively low. Comments? I think the new para doesn't make much sense,

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: OK, I have attached a proposed patch to improve this. I moved the pg_clog mention to a new paragraph and linked it to the reason the default is relatively low. The references to vacuum freeze are incorrect; autovacuum does NOT do

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-23 Thread Bruce Momjian
Alvaro Herrera wrote: Excerpts from Bruce Momjian's message of lun ago 23 14:55:55 -0400 2010: OK, I have attached a proposed patch to improve this. I moved the pg_clog mention to a new paragraph and linked it to the reason the default is relatively low. Comments? I think the new

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-22 Thread Rob Wultsch
For a documentation patch should this not be back ported to all relevant versions? On 8/21/10, Bruce Momjian br...@momjian.us wrote: Josh Berkus wrote: On further reflection, though: since we put in the BufferAccessStrategy code, which was in 8.3, the background writer isn't *supposed* to

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-22 Thread Bruce Momjian
Rob Wultsch wrote: For a documentation patch should this not be back ported to all relevant versions? It is only a minor adjustment and I normally don't backpatch that. --- On 8/21/10, Bruce Momjian br...@momjian.us

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-22 Thread Tom Lane
Bruce Momjian br...@momjian.us writes: We often mention that we do vacuum freeze for anti-wraparound vacuum, but not for pg_clog file removal, which is the primary trigger for autovacuum vacuum freezing. I have added the attached documentation patch for autovacuum_freeze_max_age;

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-22 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: We often mention that we do vacuum freeze for anti-wraparound vacuum, but not for pg_clog file removal, which is the primary trigger for autovacuum vacuum freezing. I have added the attached documentation patch for

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-21 Thread Bruce Momjian
Josh Berkus wrote: On further reflection, though: since we put in the BufferAccessStrategy code, which was in 8.3, the background writer isn't *supposed* to be very much involved in writing pages that are dirtied by VACUUM. VACUUM runs in a small ring of buffers and is supposed to have

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus
What I find interesting about that trace is the large proportion of writes. That appears to me to indicate that it's *not* a matter of vacuum delays, or at least not just a matter of that. The process seems to be getting involved in having to dump dirty buffers to disk. Perhaps the

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: What I find interesting about that trace is the large proportion of writes. That appears to me to indicate that it's *not* a matter of vacuum delays, or at least not just a matter of that. The process seems to be getting involved in having to dump dirty

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus
On further reflection, though: since we put in the BufferAccessStrategy code, which was in 8.3, the background writer isn't *supposed* to be very much involved in writing pages that are dirtied by VACUUM. VACUUM runs in a small ring of buffers and is supposed to have to clean its own dirt

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: Rather, what you need to be thinking about is how come vacuum seems to be making lots of pages dirty on only one of these machines. This is an anti-wraparound vacuum, so it could have something to do with the hint bits. Maybe it's setting the freeze bit

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: Josh Berkus j...@agliodbs.com writes: This is an anti-wraparound vacuum, so it could have something to do with the hint bits. Maybe it's setting the freeze bit on every page, and writing them one page at a time? That would explain all the writes, but it

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus
That would explain all the writes, but it doesn't seem to explain why your two servers aren't behaving similarly. Well, that's why I said ostensibly identical. There may in fact be differences, not just in the databases but in some OS libs as well. These servers have been in production for

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus
Tested that. It does look like if I increase vacuum_cost_limit to 1 and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes 2-3 before each pollsys. The math seems completely wrong on that, though -- it should be 50 and 30 pages, or similar. If I can, I'll test a vacuum

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: Most likely that's the libc implementation of the select()-based sleeps for vacuum_cost_delay. I'm still suspicious that the writes are eating more cost_delay points than you think. Tested that. It does look like if I increase vacuum_cost_limit to 1

[HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Josh Berkus
All, This is something I'd swear we fixed around 8.3.2. However, I'm seeing it again in production, and was wondering if anyone could remember what the root cause was and how we fixed it. The problem is that sometimes (but not the majority of times) autovaccum with cost_delay is going into a

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Joe Conway
On 08/16/2010 11:24 AM, Josh Berkus wrote: All, This is something I'd swear we fixed around 8.3.2. However, I'm seeing it again in production, and was wondering if anyone could remember what the root cause was and how we fixed it. I've also recently heard a report of vacuum hanging on 8.3.x

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Josh Berkus
I've also recently heard a report of vacuum hanging on 8.3.x on Solaris Sparc. Any chance you can get a backtrace from a build with debug symbols? The problem is that we haven't been able to reproduce the bug in testing. Like I said, it only seems to happen occasionally ... like maybe once in

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: This is something I'd swear we fixed around 8.3.2. However, I'm seeing it again in production, and was wondering if anyone could remember what the root cause was and how we fixed it. Hmm, I can't find anything in the 8.3-series CVS logs suggesting that

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Joe Conway
On 08/16/2010 12:12 PM, Josh Berkus wrote: I've also recently heard a report of vacuum hanging on 8.3.x on Solaris Sparc. Any chance you can get a backtrace from a build with debug symbols? The problem is that we haven't been able to reproduce the bug in testing. Like I said, it only

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Alvaro Herrera
Excerpts from Joe Conway's message of lun ago 16 16:47:19 -0400 2010: On 08/16/2010 12:12 PM, Josh Berkus wrote: I've also recently heard a report of vacuum hanging on 8.3.x on Solaris Sparc. Any chance you can get a backtrace from a build with debug symbols? The problem is that we

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Alvaro Herrera
Excerpts from Alvaro Herrera's message of lun ago 16 16:58:31 -0400 2010: I suspect that the problem may lie in the cost_delay rebalance code in autovacuum. Hmm, so we have this code: void AutoVacuumUpdateDelay(void) { if (MyWorkerInfo) { VacuumCostDelay =

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-16 Thread Josh Berkus
Another idea that comes to mind is that you have vacuum_cost_page_dirty set to an unreasonably large value, so that autovac is blocking whenever it has to write even one page. Nope. Default. And total cost was raised to 1000. -- -- Josh Berkus