Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2013-03-15 Thread Tom Lane
Greg Smith writes: > On 1/27/13 2:32 AM, Satoshi Nagayasu wrote: >> This patch is intended to improve warning message at >> AtEOXact_Buffers(), but I guess another function, >> AtProcExit_Buffers(), needs to be modified as well. Right? > Yes, good catch. I've attached an updated patch that does

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2013-03-03 Thread Greg Smith
On 1/27/13 2:32 AM, Satoshi Nagayasu wrote: This patch is intended to improve warning message at AtEOXact_Buffers(), but I guess another function, AtProcExit_Buffers(), needs to be modified as well. Right? Yes, good catch. I've attached an updated patch that does the same sort of modificatio

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2013-01-26 Thread Satoshi Nagayasu
Hi, I just reviewed this patch. https://commitfest.postgresql.org/action/patch_view?id=1035 2013/1/13 Greg Smith : > On 12/26/12 7:23 PM, Greg Stark wrote: >> >> It's also possible it's a bad cpu, not bad memory. If it affects >> decrement or increment in particular it's possible that the patter

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2013-01-15 Thread Bruce Momjian
On Sun, Jan 13, 2013 at 12:34:07AM -0500, Greg Smith wrote: > On 12/26/12 7:23 PM, Greg Stark wrote: > >It's also possible it's a bad cpu, not bad memory. If it affects > >decrement or increment in particular it's possible that the pattern of > >usage on LocalRefCount is particularly prone to trigg

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2013-01-12 Thread Greg Smith
On 12/26/12 7:23 PM, Greg Stark wrote: It's also possible it's a bad cpu, not bad memory. If it affects decrement or increment in particular it's possible that the pattern of usage on LocalRefCount is particularly prone to triggering it. This looks to be the winning answer. It turns out that u

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2013-01-09 Thread Greg Smith
On 12/23/12 3:17 PM, Simon Riggs wrote: We already have PrintBufferLeakWarning() for this, which might be a bit neater. It does look like basically the same info. I hacked the code to generate this warning all the time. Patch from Andres I've been using: WARNING: refcount of buf 1 contain

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-30 Thread Greg Stark
On Sun, Dec 30, 2012 at 3:07 AM, Greg Smith wrote: > It is a strange power of two to be appearing there. I can follow your > reasoning for why this could be a bit flipping error. There's no sign of > that elsewhere though, no other crashes under load. I'm using this server > here because it's w

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-29 Thread Peter Geoghegan
On 30 December 2012 04:37, Robert Haas wrote: > On Sat, Dec 29, 2012 at 10:07 PM, Greg Smith wrote: >> It is a strange power of two to be appearing there. I can follow your >> reasoning for why this could be a bit flipping error. There's no sign of >> that elsewhere though, no other crashes und

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-29 Thread Robert Haas
On Sat, Dec 29, 2012 at 10:07 PM, Greg Smith wrote: > It is a strange power of two to be appearing there. I can follow your > reasoning for why this could be a bit flipping error. There's no sign of > that elsewhere though, no other crashes under load. I'm using this server > here because it's

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-29 Thread Greg Smith
On 12/27/12 7:43 AM, Greg Stark wrote: If it's always the first buffer then it could conceivably still be some other heap allocated object that always lands before LocalRefCount. It does seem a bit weird to be storing 1<<30 though -- there are no 1<<30 constants that we might be storing for examp

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-29 Thread Robert Haas
On Thu, Dec 27, 2012 at 11:33 AM, Tom Lane wrote: > Greg Stark writes: >> On Thu, Dec 27, 2012 at 3:17 AM, Tom Lane wrote: >>> The thing that this theory has a hard time with is that the buffer's >>> global refcount is zero. If you assume that there's a bit that >>> sometimes randomly goes to 1

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-27 Thread Tom Lane
Greg Stark writes: > On Thu, Dec 27, 2012 at 3:17 AM, Tom Lane wrote: >> The thing that this theory has a hard time with is that the buffer's >> global refcount is zero. If you assume that there's a bit that >> sometimes randomly goes to 1 when it should be 0, then what I'd expect >> to typicall

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-27 Thread Greg Stark
On Thu, Dec 27, 2012 at 3:17 AM, Tom Lane wrote: > The thing that this theory has a hard time with is that the buffer's > global refcount is zero. If you assume that there's a bit that > sometimes randomly goes to 1 when it should be 0, then what I'd expect > to typically happen is that UnpinBuff

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Tom Lane
Greg Stark writes: > On Wed, Dec 26, 2012 at 11:47 PM, Greg Smith wrote: >> It would be nice if this were just something like a memory issue on this >> system. That I'm getting the same very odd value every time--this refcount >> of 1073741824--makes it seem less random than I expect from bad me

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Greg Stark
On Wed, Dec 26, 2012 at 11:47 PM, Greg Smith wrote: > It would be nice if this were just something like a memory issue on this > system. That I'm getting the same very odd value every time--this refcount > of 1073741824--makes it seem less random than I expect from bad memory. > Once I get a few

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Greg Smith
On 12/26/12 5:28 PM, Greg Stark wrote: Did you ever say what kind of hardware it was? This is the local reference count so I can't see how it could be a race condition or anything like that but it sure smells a bit like one. Agreed, that smell is the reason I'm proceeding so far like this is an

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Greg Smith
On 12/26/12 5:40 PM, Greg Stark wrote: Also, do you have the buffer id of the broken buffer? I wonder if it's not just any buffer but always the same same buffer even if it's a different block in that buffer. I just added something looking for that. Before I got to that I found another crash:

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Greg Stark
On Wed, Dec 26, 2012 at 6:33 PM, Tom Lane wrote: > Yeah, that destroys my theory that there's something broken about index > management specifically. Now we're looking for something that can > affect any buffer's refcount, which more than likely means it has > nothing to do with the buffer's cont

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Greg Stark
On Wed, Dec 26, 2012 at 6:33 PM, Tom Lane wrote: > Yeah, that destroys my theory that there's something broken about index > management specifically. Now we're looking for something that can > affect any buffer's refcount, which more than likely means it has > nothing to do with the buffer's cont

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Greg Smith
On 12/26/12 1:58 PM, anara...@anarazel.de wrote: I don't think its necessarily only one buffer - if I read the above output correctly Greg used the suggested debug output which just put the elog(WARN) before the Assert... Greg, could you output all "bad" buffers and only assert after the loop

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread anara...@anarazel.de
Tom Lane schrieb: >Greg Smith writes: >> To try and speed up replicating this problem I switched to a smaller >> database scale, 100, and I was able to get a crash there. Here's the > >> latest: > >> 2012-12-26 00:01:19 EST [2278]: WARNING: refcount of >base/16384/57610 >> blockNum=118571,

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Tom Lane
Greg Smith writes: > To try and speed up replicating this problem I switched to a smaller > database scale, 100, and I was able to get a crash there. Here's the > latest: > 2012-12-26 00:01:19 EST [2278]: WARNING: refcount of base/16384/57610 > blockNum=118571, flags=0x106 is 1073741824 shou

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-26 Thread Greg Smith
To try and speed up replicating this problem I switched to a smaller database scale, 100, and I was able to get a crash there. Here's the latest: 2012-12-26 00:01:19 EST [2278]: WARNING: refcount of base/16384/57610 blockNum=118571, flags=0x106 is 1073741824 should be 0, globally: 0 2012-12-

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-24 Thread Tom Lane
Greg Smith writes: > I kicked off another test that includes the block number just before Tom > suggested it, so I should have the block by tomorrow at the latest. The > range of runtime before crash is 3 to 14 hours so far. Cool. Once you get the crash, please also capture the contents of th

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-24 Thread Simon Riggs
On 24 December 2012 16:25, Greg Smith wrote: > On 12/24/12 11:10 AM, Simon Riggs wrote: > >> I wonder if you're having a hardware problem? > > > Always possible. I didn't report this until I had replicated the crash and > seen exactly the same thing twice. I've seen it crash on this assertion 6

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-24 Thread Greg Smith
On 12/24/12 11:10 AM, Simon Riggs wrote: I wonder if you're having a hardware problem? Always possible. I didn't report this until I had replicated the crash and seen exactly the same thing twice. I've seen it crash on this assertion 6 times now. Bad hardware is not normally so consistent

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-24 Thread Simon Riggs
On 24 December 2012 16:07, Tom Lane wrote: > Huh. Looks a bit like overflow of the refcount, which would explain why > it takes such a long test case to reproduce it. But how could that be > happening without somebody forgetting to decrement the refcount, which > ought to lead to a visible fail

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-24 Thread Simon Riggs
On 24 December 2012 15:57, Greg Smith wrote: > 2012-12-24 06:08:46 EST [26015]: WARNING: refcount of base/16384/49169 is > 1073741824 should be 0, globally: 0 > > That is pgbench_accounts_pkey. 1073741824 = > 0100 = 2^30 > > Pretty odd value to find in a Priva

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-24 Thread Tom Lane
Greg Smith writes: > I did get some output from the variation Andres suggested. There was > exactly one screwed up buffer: > 2012-12-24 06:08:46 EST [26015]: WARNING: refcount of base/16384/49169 > is 1073741824 should be 0, globally: 0 > That is pgbench_accounts_pkey. 1073741824 = > 0100 0

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-24 Thread Greg Smith
On 12/23/12 3:17 PM, Simon Riggs wrote: We already have PrintBufferLeakWarning() for this, which might be a bit neater. Maybe. I tried using this, and I just got a seg fault within that code. I can't figure out if I called it incorrectly or if the buffer involved is so damaged that PrintBuff

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Simon Riggs
On 23 December 2012 21:52, Greg Smith wrote: > On 12/23/12 3:17 PM, Simon Riggs wrote: >> >> If that last change was the cause, then its caused within VACUUM. I'm >> running a thrash test with autovacuums set much more frequently but >> nothing yet. > > > I am not very suspicious of that VACUUM ch

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Greg Smith
On 12/23/12 3:17 PM, Simon Riggs wrote: If that last change was the cause, then its caused within VACUUM. I'm running a thrash test with autovacuums set much more frequently but nothing yet. I am not very suspicious of that VACUUM change; just pointed it out for completeness sake. Are you b

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Simon Riggs
On 23 December 2012 19:42, Greg Smith wrote: > diff --git a/src/backend/storage/buffer/bufmgr.c > b/src/backend/storage/buffer/bufmgr.c > index dddb6c0..df43643 100644 > --- a/src/backend/storage/buffer/bufmgr.c > +++ b/src/backend/storage/buffer/bufmgr.c > @@ -1697,11 +1697,21 @@ AtEOXact_Buffer

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Greg Smith
On 12/23/12 1:10 PM, Tom Lane wrote: It might also be interesting to know if there is more than one still-pinned buffer --- that is, if you're going to hack the code, fix it to elog(LOG) each pinned buffer and then panic after completing the loop. Easy enough; I kept it so the actual source o

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Tom Lane
Andres Freund writes: >> This is my first test like this against 9.3 development though, so the cause >> could be an earlier commit. I'm just starting with the most recent work as >> the first suspect. Next I think I'll try autovacuum=off and see if the >> crash goes away. Other ideas are welco

Re: [HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-23 Thread Andres Freund
Hi, On 2012-12-23 02:36:42 -0500, Greg Smith wrote: > I'm testing a checkout from a few days ago and trying to complete a day long > pgbench stress test, with assertions and debugging on. I want to make sure > the base code works as expected before moving on to testing checksums. It's > crashing

[HACKERS] buffer assertion tripping under repeat pgbench load

2012-12-22 Thread Greg Smith
I'm testing a checkout from a few days ago and trying to complete a day long pgbench stress test, with assertions and debugging on. I want to make sure the base code works as expected before moving on to testing checksums. It's crashing before finishing though. Here's a sample: 2012-12-20 2