Re: [HACKERS] catalog corruption bug

2006-01-09 Thread Jeremy Drake
On Sun, 8 Jan 2006, Tom Lane wrote: Yeah, that's not very surprising. Running the forced-cache-resets function will definitely expose that catcache bug pretty quickly. You'd need to apply the patches I put in yesterday to have a system that has any chance of withstanding that treatment for

Re: [HACKERS] catalog corruption bug

2006-01-09 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes: I ran without that function you made, and it got the error, but not a crash. I stuck an Assert(false) right before the ereport for that particular error, and I did end up with a core there, but I don't see anything out of the ordinary (what little I know

Re: [HACKERS] catalog corruption bug

2006-01-09 Thread Jeremy Drake
On Mon, 9 Jan 2006, Tom Lane wrote: Does your application drop these temp tables explicitly, or leave them to be dropped automatically during commit? It might be interesting to see whether changing that makes any difference. I drop them explicitly at the end of the function. I'm also

Re: [HACKERS] catalog corruption bug

2006-01-08 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes: On Sat, 7 Jan 2006, Tom Lane wrote: A bit of a leap in the dark, but: maybe the triggering event for this situation is not a VACUUM pg_amop but a global cache reset due to sinval message buffer overrun. I tried that function you sent, while running my

Re: [HACKERS] catalog corruption bug

2006-01-07 Thread Jeremy Drake
On Fri, 6 Jan 2006, Tom Lane wrote: OK, this must be a different issue then. I think we have seen reports like this one before, but not been able to reproduce it. Could you rebuild with Asserts enabled and see if any asserts trigger? I got an assert to fail. I'm not entirely sure if this

Re: [HACKERS] catalog corruption bug

2006-01-07 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes: I got an assert to fail. I'm not entirely sure if this is helpful, but I managed to get a core dump with --enable-debug and --enable-cassert (with optimizations still on). Let me know if there is anything else that would be useful to get out of this

Re: [HACKERS] catalog corruption bug

2006-01-07 Thread Jeremy Drake
On Sat, 7 Jan 2006, Tom Lane wrote: Fascinating --- that's not anywhere near where I thought your problem was. Which cache is this tuple in? (Print *ct-my_cache) $2 = { id = 3, cc_next = 0x2aac1048, cc_relname = 0x2ab19df8 pg_amop, cc_reloid = 2602, cc_indexoid = 2654,

Re: [HACKERS] catalog corruption bug

2006-01-07 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes: Am I correct in interpreting this as the hash opclass for Oid? No, it's the AMOPOPID catalog cache (containing rows from pg_amop indexed by amopopr/amopclaid). After digging around for a bit I noticed that catalog caches get flushed if someone vacuums the

Re: [HACKERS] catalog corruption bug

2006-01-07 Thread Jeremy Drake
On Sat, 7 Jan 2006, Tom Lane wrote: Jeremy Drake [EMAIL PROTECTED] writes: Am I correct in interpreting this as the hash opclass for Oid? However, AFAICS the only consequence of this bug is to trigger that Assert failure if you've got Asserts enabled. Dead catcache entries aren't actually

Re: [HACKERS] catalog corruption bug

2006-01-07 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes: On Sat, 7 Jan 2006, Tom Lane wrote: I'll go fix CatCacheRemoveCList, but I think this is not the bug we're looking for. Incidentally, one of my processes did get that error at the same time. All of the other processes had an error DBD::Pg::st execute

Re: [HACKERS] catalog corruption bug

2006-01-07 Thread Jeremy Drake
On Sat, 7 Jan 2006, Tom Lane wrote: Jeremy Drake [EMAIL PROTECTED] writes: On Sat, 7 Jan 2006, Tom Lane wrote: I'll go fix CatCacheRemoveCList, but I think this is not the bug we're looking for. A bit of a leap in the dark, but: maybe the triggering event for this situation is not a

Re: [HACKERS] catalog corruption bug

2006-01-06 Thread Jeremy Drake
On Thu, 5 Jan 2006, Tom Lane wrote: The ReadBuffer bug I just fixed could result in disappearance of catalog rows, so this observation is consistent with the theory that that's what's biting you. It's not proof though... Well, I applied that patch that you sent me the link to (the bufmgr.c

Re: [HACKERS] catalog corruption bug

2006-01-06 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes: Well, I applied that patch that you sent me the link to (the bufmgr.c one), and rebuilt (PORTDIR_OVERLAY is cool...) I ran my nine processes which hammer things overnight, and in the morning one of them was dead. DBD::Pg::st execute failed: ERROR:

Re: [HACKERS] catalog corruption bug

2006-01-06 Thread Jeremy Drake
On Fri, 6 Jan 2006, Tom Lane wrote: Jeremy Drake [EMAIL PROTECTED] writes: Well, I applied that patch that you sent me the link to (the bufmgr.c one), and rebuilt (PORTDIR_OVERLAY is cool...) I ran my nine processes which hammer things overnight, and in the morning one of them was dead.

Re: [HACKERS] catalog corruption bug

2006-01-06 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes: DBD::Pg::st execute failed: ERROR: duplicate key violates unique constraint pg_type_typname_nsp_index Hm, did you REINDEX things beforehand? This could be leftover corruption... Yes. I ran that VACUUM FULL ANALYZE VERBOSE which I emailed part of the

Re: [HACKERS] catalog corruption bug

2006-01-05 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes: We have encountered a very nasty but apparently rare bug which appears to result in catalog corruption. I've been fooling around with this report today. In several hours of trying, I've been able to get one Assert failure from running Jeremy's example on

Re: [HACKERS] catalog corruption bug

2006-01-05 Thread Jeremy Drake
Here is some additional information that I have managed to gather today regarding this. It is not really what causes it, so much as what does not. I removed all plperl from the loading processes. I did a VACUUM FULL ANALYZE, and then I reindexed everything in the database (Including starting

Re: [HACKERS] catalog corruption bug

2006-01-05 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes: Here is some additional information that I have managed to gather today regarding this. It is not really what causes it, so much as what does not. ... Similar for pg_type, there being 248 index row versions vs 244 row versions in the table. The

Re: [HACKERS] catalog corruption bug

2006-01-04 Thread Jeremy Drake
On Wed, 21 Dec 2005, Tom Lane wrote: Jeremy Drake [EMAIL PROTECTED] writes: We have encountered a very nasty but apparently rare bug which appears to result in catalog corruption. How much of this can you reproduce on 8.1.1? We've fixed a few issues already. We did not see this problem

[HACKERS] catalog corruption bug

2005-12-21 Thread Jeremy Drake
We have encountered a very nasty but apparently rare bug which appears to result in catalog corruption. I have not been able to pin down an exact sequence of events which cause this problem, it appears to be a race condition of some sort. This is what I have been able to figure out so far. * It

Re: [HACKERS] catalog corruption bug

2005-12-21 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes: We have encountered a very nasty but apparently rare bug which appears to result in catalog corruption. How much of this can you reproduce on 8.1.1? We've fixed a few issues already. This was built from the gentoo ebuild version 8.1.0 I'd be even more