Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2013-03-15 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes: On 1/27/13 2:32 AM, Satoshi Nagayasu wrote: This patch is intended to improve warning message at AtEOXact_Buffers(), but I guess another function, AtProcExit_Buffers(), needs to be modified as well. Right? Yes, good catch. I've attached an updated

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2013-03-03 Thread Greg Smith
On 1/27/13 2:32 AM, Satoshi Nagayasu wrote: This patch is intended to improve warning message at AtEOXact_Buffers(), but I guess another function, AtProcExit_Buffers(), needs to be modified as well. Right? Yes, good catch. I've attached an updated patch that does the same sort of

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2013-01-26 Thread Satoshi Nagayasu
Hi, I just reviewed this patch. https://commitfest.postgresql.org/action/patch_view?id=1035 2013/1/13 Greg Smith g...@2ndquadrant.com: On 12/26/12 7:23 PM, Greg Stark wrote: It's also possible it's a bad cpu, not bad memory. If it affects decrement or increment in particular it's possible

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2013-01-15 Thread Bruce Momjian
On Sun, Jan 13, 2013 at 12:34:07AM -0500, Greg Smith wrote: On 12/26/12 7:23 PM, Greg Stark wrote: It's also possible it's a bad cpu, not bad memory. If it affects decrement or increment in particular it's possible that the pattern of usage on LocalRefCount is particularly prone to triggering

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2013-01-12 Thread Greg Smith
On 12/26/12 7:23 PM, Greg Stark wrote: It's also possible it's a bad cpu, not bad memory. If it affects decrement or increment in particular it's possible that the pattern of usage on LocalRefCount is particularly prone to triggering it. This looks to be the winning answer. It turns out that

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2013-01-09 Thread Greg Smith
On 12/23/12 3:17 PM, Simon Riggs wrote: We already have PrintBufferLeakWarning() for this, which might be a bit neater. It does look like basically the same info. I hacked the code to generate this warning all the time. Patch from Andres I've been using: WARNING: refcount of buf 1

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-30 Thread Greg Stark
On Sun, Dec 30, 2012 at 3:07 AM, Greg Smith g...@2ndquadrant.com wrote: It is a strange power of two to be appearing there. I can follow your reasoning for why this could be a bit flipping error. There's no sign of that elsewhere though, no other crashes under load. I'm using this server

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-29 Thread Robert Haas
On Thu, Dec 27, 2012 at 11:33 AM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: On Thu, Dec 27, 2012 at 3:17 AM, Tom Lane t...@sss.pgh.pa.us wrote: The thing that this theory has a hard time with is that the buffer's global refcount is zero. If you assume that there's a

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-29 Thread Greg Smith
On 12/27/12 7:43 AM, Greg Stark wrote: If it's always the first buffer then it could conceivably still be some other heap allocated object that always lands before LocalRefCount. It does seem a bit weird to be storing 130 though -- there are no 130 constants that we might be storing for example.

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-29 Thread Robert Haas
On Sat, Dec 29, 2012 at 10:07 PM, Greg Smith g...@2ndquadrant.com wrote: It is a strange power of two to be appearing there. I can follow your reasoning for why this could be a bit flipping error. There's no sign of that elsewhere though, no other crashes under load. I'm using this server

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-29 Thread Peter Geoghegan
On 30 December 2012 04:37, Robert Haas robertmh...@gmail.com wrote: On Sat, Dec 29, 2012 at 10:07 PM, Greg Smith g...@2ndquadrant.com wrote: It is a strange power of two to be appearing there. I can follow your reasoning for why this could be a bit flipping error. There's no sign of that

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-27 Thread Greg Stark
On Thu, Dec 27, 2012 at 3:17 AM, Tom Lane t...@sss.pgh.pa.us wrote: The thing that this theory has a hard time with is that the buffer's global refcount is zero. If you assume that there's a bit that sometimes randomly goes to 1 when it should be 0, then what I'd expect to typically happen is

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-27 Thread Tom Lane
Greg Stark st...@mit.edu writes: On Thu, Dec 27, 2012 at 3:17 AM, Tom Lane t...@sss.pgh.pa.us wrote: The thing that this theory has a hard time with is that the buffer's global refcount is zero. If you assume that there's a bit that sometimes randomly goes to 1 when it should be 0, then what

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Greg Smith
To try and speed up replicating this problem I switched to a smaller database scale, 100, and I was able to get a crash there. Here's the latest: 2012-12-26 00:01:19 EST [2278]: WARNING: refcount of base/16384/57610 blockNum=118571, flags=0x106 is 1073741824 should be 0, globally: 0

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes: To try and speed up replicating this problem I switched to a smaller database scale, 100, and I was able to get a crash there. Here's the latest: 2012-12-26 00:01:19 EST [2278]: WARNING: refcount of base/16384/57610 blockNum=118571, flags=0x106 is

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread anara...@anarazel.de
Tom Lane t...@sss.pgh.pa.us schrieb: Greg Smith g...@2ndquadrant.com writes: To try and speed up replicating this problem I switched to a smaller database scale, 100, and I was able to get a crash there. Here's the latest: 2012-12-26 00:01:19 EST [2278]: WARNING: refcount of

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Greg Smith
On 12/26/12 1:58 PM, anara...@anarazel.de wrote: I don't think its necessarily only one buffer - if I read the above output correctly Greg used the suggested debug output which just put the elog(WARN) before the Assert... Greg, could you output all bad buffers and only assert after the loop

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Greg Stark
On Wed, Dec 26, 2012 at 6:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah, that destroys my theory that there's something broken about index management specifically. Now we're looking for something that can affect any buffer's refcount, which more than likely means it has nothing to do with

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Greg Stark
On Wed, Dec 26, 2012 at 6:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah, that destroys my theory that there's something broken about index management specifically. Now we're looking for something that can affect any buffer's refcount, which more than likely means it has nothing to do with

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Greg Smith
On 12/26/12 5:40 PM, Greg Stark wrote: Also, do you have the buffer id of the broken buffer? I wonder if it's not just any buffer but always the same same buffer even if it's a different block in that buffer. I just added something looking for that. Before I got to that I found another crash:

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Greg Smith
On 12/26/12 5:28 PM, Greg Stark wrote: Did you ever say what kind of hardware it was? This is the local reference count so I can't see how it could be a race condition or anything like that but it sure smells a bit like one. Agreed, that smell is the reason I'm proceeding so far like this is

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Greg Stark
On Wed, Dec 26, 2012 at 11:47 PM, Greg Smith g...@2ndquadrant.com wrote: It would be nice if this were just something like a memory issue on this system. That I'm getting the same very odd value every time--this refcount of 1073741824--makes it seem less random than I expect from bad memory.

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Tom Lane
Greg Stark st...@mit.edu writes: On Wed, Dec 26, 2012 at 11:47 PM, Greg Smith g...@2ndquadrant.com wrote: It would be nice if this were just something like a memory issue on this system. That I'm getting the same very odd value every time--this refcount of 1073741824--makes it seem less

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-24 Thread Greg Smith
On 12/23/12 3:17 PM, Simon Riggs wrote: We already have PrintBufferLeakWarning() for this, which might be a bit neater. Maybe. I tried using this, and I just got a seg fault within that code. I can't figure out if I called it incorrectly or if the buffer involved is so damaged that

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-24 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes: I did get some output from the variation Andres suggested. There was exactly one screwed up buffer: 2012-12-24 06:08:46 EST [26015]: WARNING: refcount of base/16384/49169 is 1073741824 should be 0, globally: 0 That is pgbench_accounts_pkey.

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-24 Thread Simon Riggs
On 24 December 2012 15:57, Greg Smith g...@2ndquadrant.com wrote: 2012-12-24 06:08:46 EST [26015]: WARNING: refcount of base/16384/49169 is 1073741824 should be 0, globally: 0 That is pgbench_accounts_pkey. 1073741824 = 0100 = 2^30 Pretty odd value to

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-24 Thread Simon Riggs
On 24 December 2012 16:07, Tom Lane t...@sss.pgh.pa.us wrote: Huh. Looks a bit like overflow of the refcount, which would explain why it takes such a long test case to reproduce it. But how could that be happening without somebody forgetting to decrement the refcount, which ought to lead to

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-24 Thread Greg Smith
On 12/24/12 11:10 AM, Simon Riggs wrote: I wonder if you're having a hardware problem? Always possible. I didn't report this until I had replicated the crash and seen exactly the same thing twice. I've seen it crash on this assertion 6 times now. Bad hardware is not normally so

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-24 Thread Simon Riggs
On 24 December 2012 16:25, Greg Smith g...@2ndquadrant.com wrote: On 12/24/12 11:10 AM, Simon Riggs wrote: I wonder if you're having a hardware problem? Always possible. I didn't report this until I had replicated the crash and seen exactly the same thing twice. I've seen it crash on this

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-24 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes: I kicked off another test that includes the block number just before Tom suggested it, so I should have the block by tomorrow at the latest. The range of runtime before crash is 3 to 14 hours so far. Cool. Once you get the crash, please also capture

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Andres Freund
Hi, On 2012-12-23 02:36:42 -0500, Greg Smith wrote: I'm testing a checkout from a few days ago and trying to complete a day long pgbench stress test, with assertions and debugging on. I want to make sure the base code works as expected before moving on to testing checksums. It's crashing

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: This is my first test like this against 9.3 development though, so the cause could be an earlier commit. I'm just starting with the most recent work as the first suspect. Next I think I'll try autovacuum=off and see if the crash goes away. Other

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Greg Smith
On 12/23/12 1:10 PM, Tom Lane wrote: It might also be interesting to know if there is more than one still-pinned buffer --- that is, if you're going to hack the code, fix it to elog(LOG) each pinned buffer and then panic after completing the loop. Easy enough; I kept it so the actual source

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Simon Riggs
On 23 December 2012 19:42, Greg Smith g...@2ndquadrant.com wrote: diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index dddb6c0..df43643 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1697,11 +1697,21 @@

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Greg Smith
On 12/23/12 3:17 PM, Simon Riggs wrote: If that last change was the cause, then its caused within VACUUM. I'm running a thrash test with autovacuums set much more frequently but nothing yet. I am not very suspicious of that VACUUM change; just pointed it out for completeness sake. Are you

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Simon Riggs
On 23 December 2012 21:52, Greg Smith g...@2ndquadrant.com wrote: On 12/23/12 3:17 PM, Simon Riggs wrote: If that last change was the cause, then its caused within VACUUM. I'm running a thrash test with autovacuums set much more frequently but nothing yet. I am not very suspicious of that

[HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-22 Thread Greg Smith
I'm testing a checkout from a few days ago and trying to complete a day long pgbench stress test, with assertions and debugging on. I want to make sure the base code works as expected before moving on to testing checksums. It's crashing before finishing though. Here's a sample: 2012-12-20