Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-11-01 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > Doesn't clog use the same code? Yeah, but all three of your examples were referencing pg_subtrans, as proven by the stack traces and the contents of the shared control block. Even though the bug seems completely clog.c's fault, this is not a coincidenc

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-11-01 Thread Tom Lane
I wrote: > Even though the bug seems completely clog.c's fault, s/clog.c/slru.c/ of course :-( regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-11-01 Thread Jim C. Nasby
On Tue, Nov 01, 2005 at 11:23:55AM -0300, Alvaro Herrera wrote: > Tom Lane wrote: > > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > > AFAIK they're not using subtransactions at all, but I'll check. > > > > Well, yeah, they are ... else you'd never have seen this failure. > > Maybe it's in plpgsq

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-11-01 Thread Jim C. Nasby
On Tue, Nov 01, 2005 at 11:23:55AM -0300, Alvaro Herrera wrote: > Tom Lane wrote: > > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > > AFAIK they're not using subtransactions at all, but I'll check. > > > > Well, yeah, they are ... else you'd never have seen this failure. > > Maybe it's in plpgsq

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-11-01 Thread Alvaro Herrera
Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > AFAIK they're not using subtransactions at all, but I'll check. > > Well, yeah, they are ... else you'd never have seen this failure. Maybe it's in plpgsql EXCEPTION clauses. -- Alvaro Herrera Valdivia, Chile ICBM: S 39ยบ 4

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > AFAIK they're not using subtransactions at all, but I'll check. Well, yeah, they are ... else you'd never have seen this failure. regards, tom lane ---(end of broadcast)--- TIP 9:

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Jim C. Nasby
On Mon, Oct 31, 2005 at 09:02:59PM -0300, Alvaro Herrera wrote: > Jim C. Nasby wrote: > > Now that I've got a little better idea of what this code does, I've > > noticed something interesting... this issue is happening on an 8-way > > machine, and NUM_SLRU_BUFFERS is currently defined at 8. Doesn't

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Alvaro Herrera
Jim C. Nasby wrote: > Now that I've got a little better idea of what this code does, I've > noticed something interesting... this issue is happening on an 8-way > machine, and NUM_SLRU_BUFFERS is currently defined at 8. Doesn't this > greatly increase the odds of buffer conflicts? Bug aside, would

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Jim C. Nasby
Now that I've got a little better idea of what this code does, I've noticed something interesting... this issue is happening on an 8-way machine, and NUM_SLRU_BUFFERS is currently defined at 8. Doesn't this greatly increase the odds of buffer conflicts? Bug aside, would it be better to set NUM_SLRU

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Jim C. Nasby
On Mon, Oct 31, 2005 at 01:05:06PM -0500, Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote: > >> This won't do as a permanent patch, because it isn't guaranteed to fix > >> the problem on machines that don't strongly order wri

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote: >> This won't do as a permanent patch, because it isn't guaranteed to fix >> the problem on machines that don't strongly order writes, but it should >> work on Opterons, at least well enough to co

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Jim C. Nasby
Sorry, two more things... Will increasing shared_buffers make this less likely to occur? Or is this just something that's likely to happen when there are things like seqscans that are putting buffers near the front of the LRU? (The 8.0.3 buffer manager does something like that, right?) Is this so

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Jim C. Nasby
On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote: > I'd like Jim to test this theory by seeing if it helps to reverse the > order of the if-test elements at lines 294/295, ie make it look like > > if (shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS || > shared

slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-30 Thread Tom Lane
OK, I think I see it. The problem is that the code in slru.c is careful about not modifying state when it doesn't hold the proper lock, but not so careful about not *inspecting* state without the proper lock. In particular consider these lines in SimpleLruReadPage (line numbers are as in CVS tip)