On 2/19/17 5:27 AM, Robert Haas wrote:
(1) a multi-batch hash join, (2) a nested loop,
and (3) a merge join. (2) is easy to implement but will generate a
ton of random I/O if the table is not resident in RAM. (3) is most
suitable for very large tables but takes more work to code, and is
also
On Sun, Feb 19, 2017 at 3:52 PM, Pavan Deolasee
wrote:
> This particular case of corruption results in a heap tuple getting indexed
> by a wrong key (or to be precise, indexed by its old value). So the only way
> to detect the corruption is to look at each index key and
On Sun, Feb 19, 2017 at 3:43 PM, Robert Haas wrote:
> On Fri, Feb 17, 2017 at 11:15 PM, Tom Lane wrote:
>
> > Ah, nah, scratch that. If any post-index-build pruning had occurred on a
> > page, the evidence would be gone --- the non-matching older
On Fri, Feb 17, 2017 at 11:15 PM, Tom Lane wrote:
> I wrote:
>> However, you might be able to find it without so much random I/O.
>> I'm envisioning a seqscan over the table, in which you simply look for
>> HOT chains in which the indexed columns aren't all the same. When you
On Fri, Feb 17, 2017 at 9:31 AM, Tom Lane wrote:
> This seems like it'd be quite a different tool than amcheck, though.
> Also, it would only find broken-HOT-chain corruption, which might be
> a rare enough issue to not deserve a single-purpose tool.
FWIW, my ambition for
I wrote:
> However, you might be able to find it without so much random I/O.
> I'm envisioning a seqscan over the table, in which you simply look for
> HOT chains in which the indexed columns aren't all the same. When you
> find one, you'd have to do a pretty expensive index lookup to confirm
>
Peter Geoghegan writes:
> The difference with a test that could detect this variety of
> corruption is that that would need to visit the heap, which tends to
> be much larger than any one index, or even all indexes. That would
> probably need to be random I/O, too. It might be
On Fri, Feb 17, 2017 at 8:23 AM, Keith Fiske wrote:
> It's not the load I'm worried about, it's the locks that are required at
> some point during the rebuild. Doing an index rebuild here and there isn't a
> big deal, but trying to do it for an entire heavily loaded,
Keith Fiske wrote:
> I can understandable if it's simply not possible, but if it is, I think in
> any cases of data corruption, having some means to check for it to be sure
> you're in the clear would be useful.
Maybe it is possible. I just didn't try, since it didn't seem very
useful.
--
On Fri, Feb 17, 2017 at 11:12 AM, Alvaro Herrera
wrote:
> Keith Fiske wrote:
>
> > Was just curious if anyone was able to come up with any sort of method to
> > test whether an index was corrupted by this bug, other than just waiting
> > for bad query results? We've
Keith Fiske wrote:
> Was just curious if anyone was able to come up with any sort of method to
> test whether an index was corrupted by this bug, other than just waiting
> for bad query results? We've used concurrent index rebuilding quite
> extensively over the years to remove bloat from busy
On Mon, Feb 6, 2017 at 10:17 PM, Amit Kapila
wrote:
> On Mon, Feb 6, 2017 at 10:28 PM, Tom Lane wrote:
> > Amit Kapila writes:
> >> Hmm. Consider that the first time relcahe invalidation occurs while
> >> computing
On Mon, Feb 6, 2017 at 10:28 PM, Tom Lane wrote:
> Amit Kapila writes:
>> Hmm. Consider that the first time relcahe invalidation occurs while
>> computing id_attrs, so now the retry logic will compute the correct
>> set of attrs (considering two
On Mon, Feb 6, 2017 at 11:54 PM, Tom Lane wrote:
> After some discussion among the release team, we've concluded that the
> best thing to do is to push Pavan's/my patch into today's releases.
> This does not close the matter by any means: we should continue to
> study whether
After some discussion among the release team, we've concluded that the
best thing to do is to push Pavan's/my patch into today's releases.
This does not close the matter by any means: we should continue to
study whether there are related bugs or whether there's a more principled
way of fixing this
On Sun, Feb 5, 2017 at 9:42 PM, Pavan Deolasee wrote:
> On Mon, Feb 6, 2017 at 5:41 AM, Peter Geoghegan wrote:
>> On Sun, Feb 5, 2017 at 4:09 PM, Robert Haas wrote:
>> > I don't think this kind of black-and-white thinking is very
Alvaro Herrera writes:
> Tom Lane wrote:
>> Better to fix the callers so that they don't have the assumption you
>> refer to. Or maybe we could adjust the API of RelationGetIndexAttrBitmap
>> so that it returns all the sets needed by a given calling module at
>> once,
Tom Lane wrote:
> Better to fix the callers so that they don't have the assumption you
> refer to. Or maybe we could adjust the API of RelationGetIndexAttrBitmap
> so that it returns all the sets needed by a given calling module at
> once, which would allow us to guarantee they're consistent.
Amit Kapila writes:
> Hmm. Consider that the first time relcahe invalidation occurs while
> computing id_attrs, so now the retry logic will compute the correct
> set of attrs (considering two indexes, if we take the example given by
> Alvaro above.). However, the attrs
Andres Freund wrote:
> To show what I mean here's an *unpolished* and *barely tested* patch
> implementing the first of my suggestions.
>
> Alvaro, Pavan, I think should address the issue as well?
Hmm, interesting idea. Maybe a patch along these lines is a good way to
make index-list cache
Tom Lane wrote:
> Pavan Deolasee writes:
> > 2. In the second patch, we tried to recompute attribute lists if a relcache
> > flush happens in between and index list is invalidated. We've seen problems
> > with that, especially it getting into an infinite loop with
> >
El 05/02/17 a las 21:57, Tomas Vondra escribió:
>
> +1 to not rushing fixes into releases. While I think we now finally
> understand the mechanics of this bug, the fact that we came up with
> three different fixes in this thread, only to discover issues with each
> of them, warrants some caution.
> 6 февр. 2017 г., в 4:57, Peter Geoghegan написал(а):
>
> I meant that I find the fact that there were no user reports in all
> these years to be a good reason to not proceed for now in this
> instance.
Well, we had some strange situations with indexes (see below for example)
On Mon, Feb 6, 2017 at 9:47 AM, Pavan Deolasee wrote:
>
>
> On Mon, Feb 6, 2017 at 9:41 AM, Amit Kapila wrote:
>>
>>
>>
>> Hmm. Consider that the first time relcahe invalidation occurs while
>> computing id_attrs, so now the retry logic will
On Mon, Feb 6, 2017 at 8:01 AM, Pavan Deolasee
wrote:
>
>>
> I like this approach. I'll run the patch on a build with
> CACHE_CLOBBER_ALWAYS, but I'm pretty sure it will be ok.
>
While it looks certain that the fix will miss this point release, FWIW I
ran this patch
On Mon, Feb 6, 2017 at 9:41 AM, Amit Kapila wrote:
>
>
> Hmm. Consider that the first time relcahe invalidation occurs while
> computing id_attrs, so now the retry logic will compute the correct
> set of attrs (considering two indexes, if we take the example given by
>
On Mon, Feb 6, 2017 at 8:35 AM, Pavan Deolasee wrote:
>
>
> On Mon, Feb 6, 2017 at 8:15 AM, Amit Kapila wrote:
>>
>> On Mon, Feb 6, 2017 at 8:01 AM, Pavan Deolasee
>> wrote:
>> >
>> >
>> > On Mon, Feb 6, 2017 at 1:44
On 2017-02-05 22:34:34 -0500, Tom Lane wrote:
> Pavan Deolasee writes:
> The point is that there's a nontrivial chance of a hasty fix introducing
> worse problems than we fix.
>
> Given the lack of consensus about exactly how to fix this, I'm feeling
> like it's a good
Pavan Deolasee writes:
> On Mon, Feb 6, 2017 at 5:44 AM, Andres Freund wrote:
>> +1. I don't think we serve our users by putting out a nontrivial bugfix
>> hastily. Nor do I think we serve them in this instance by delaying the
>> release to cover
On 2017-02-05 21:49:57 -0500, Tom Lane wrote:
> Andres Freund writes:
> > I've not yet read the full thread, but I'm a bit confused so far. We
> > obviously can get changing information about indexes here, but isn't
> > that something we have to deal with anyway? If we
On Mon, Feb 6, 2017 at 5:44 AM, Andres Freund wrote:
> On 2017-02-05 12:51:09 -0500, Tom Lane wrote:
> > Michael Paquier writes:
> > > On Sun, Feb 5, 2017 at 6:53 PM, Pavel Stehule
> wrote:
> > >> I agree with Pavan - a
On Mon, Feb 6, 2017 at 8:15 AM, Amit Kapila wrote:
> On Mon, Feb 6, 2017 at 8:01 AM, Pavan Deolasee
> wrote:
> >
> >
> > On Mon, Feb 6, 2017 at 1:44 AM, Tom Lane wrote:
> >>
> >>
> >>
> >> > 2. In the second patch, we tried
Andres Freund writes:
> I've not yet read the full thread, but I'm a bit confused so far. We
> obviously can get changing information about indexes here, but isn't
> that something we have to deal with anyway? If we guarantee that we
> don't loose knowledge that there's a
On 2017-02-06 08:08:01 +0530, Amit Kapila wrote:
> I don't see in your patch that you are setting rd_bitmapsvalid to 0.
IIRC a plain relcache rebuild should do that (note there's also no place
that directly resets rd_indexattrs).
> Also, I think this suffers from the same problem as the patch
On Sun, Feb 5, 2017 at 6:42 PM, Pavan Deolasee wrote:
> I'm not sure that just because the bug wasn't reported by a user, makes it
> any less critical. As Tomas pointed down thread, the nature of the bug is
> such that the users may not discover it very easily, but that
On Mon, Feb 6, 2017 at 8:01 AM, Pavan Deolasee wrote:
>
>
> On Mon, Feb 6, 2017 at 1:44 AM, Tom Lane wrote:
>>
>>
>>
>> > 2. In the second patch, we tried to recompute attribute lists if a
>> > relcache
>> > flush happens in between and index list is
On Mon, Feb 6, 2017 at 5:41 AM, Peter Geoghegan wrote:
> On Sun, Feb 5, 2017 at 4:09 PM, Robert Haas wrote:
> > I don't think this kind of black-and-white thinking is very helpful.
> > Obviously, data corruption is bad. However, this bug has (from what
> >
On Mon, Feb 6, 2017 at 6:27 AM, Andres Freund wrote:
> Hi,
>
> On 2017-02-05 16:37:33 -0800, Andres Freund wrote:
>> > RelationGetIndexList(Relation relation)
>> > @@ -4746,8 +4747,10 @@ RelationGetIndexPredicate(Relation relat
>> > * we can include system attributes (e.g.,
On Mon, Feb 6, 2017 at 1:44 AM, Tom Lane wrote:
>
>
> > 2. In the second patch, we tried to recompute attribute lists if a
> relcache
> > flush happens in between and index list is invalidated. We've seen
> problems
> > with that, especially it getting into an infinite loop
On Sun, Feb 5, 2017 at 4:57 PM, Tomas Vondra
wrote:
> OTOH I disagree with the notion that bugs that are not driven by user
> reports are somehow less severe. Some data corruption bugs cause quite
> visible breakage - segfaults, immediate crashes, etc. Those are
Hi,
On 2017-02-05 16:37:33 -0800, Andres Freund wrote:
> > RelationGetIndexList(Relation relation)
> > @@ -4746,8 +4747,10 @@ RelationGetIndexPredicate(Relation relat
> > * we can include system attributes (e.g., OID) in the bitmap
> > representation.
> > *
> > * Caller had better hold at
On 02/06/2017 01:11 AM, Peter Geoghegan wrote:
On Sun, Feb 5, 2017 at 4:09 PM, Robert Haas wrote:
I don't think this kind of black-and-white thinking is very
helpful. Obviously, data corruption is bad. However, this bug has
(from what one can tell from this thread) been
On 2017-02-05 15:14:59 -0500, Tom Lane wrote:
> I do not like any of the other patches proposed in this thread, because
> they fail to guarantee delivering an up-to-date attribute bitmap to the
> caller. I think we need a retry loop, and I think that it needs to look
> like the attached.
That
On 2017-02-05 12:51:09 -0500, Tom Lane wrote:
> Michael Paquier writes:
> > On Sun, Feb 5, 2017 at 6:53 PM, Pavel Stehule
> > wrote:
> >> I agree with Pavan - a release with known important bug is not good idea.
>
> > This bug has been around
On Sun, Feb 5, 2017 at 4:09 PM, Robert Haas wrote:
> I don't think this kind of black-and-white thinking is very helpful.
> Obviously, data corruption is bad. However, this bug has (from what
> one can tell from this thread) been with us for over a decade; it must
>
On Sun, Feb 5, 2017 at 1:34 PM, Martín Marqués wrote:
> El 05/02/17 a las 10:03, Michael Paquier escribió:
>> On Sun, Feb 5, 2017 at 6:53 PM, Pavel Stehule
>> wrote:
>>> I agree with Pavan - a release with known important bug is not good idea.
>>
[ Having now read the whole thread, I'm prepared to weigh in ... ]
Pavan Deolasee writes:
> This seems like a real problem to me. I don't know what the consequences
> are, but definitely having various attribute lists to have different view
> of the set of indexes
El 05/02/17 a las 10:03, Michael Paquier escribió:
> On Sun, Feb 5, 2017 at 6:53 PM, Pavel Stehule wrote:
>> I agree with Pavan - a release with known important bug is not good idea.
>
> This bug has been around for some time, so I would recommend taking
> the time
2017-02-05 18:51 GMT+01:00 Tom Lane :
> Michael Paquier writes:
> > On Sun, Feb 5, 2017 at 6:53 PM, Pavel Stehule
> wrote:
> >> I agree with Pavan - a release with known important bug is not good
> idea.
>
> > This bug has
Michael Paquier writes:
> On Sun, Feb 5, 2017 at 6:53 PM, Pavel Stehule wrote:
>> I agree with Pavan - a release with known important bug is not good idea.
> This bug has been around for some time, so I would recommend taking
> the time
On Sun, Feb 5, 2017 at 6:53 PM, Pavel Stehule wrote:
> I agree with Pavan - a release with known important bug is not good idea.
This bug has been around for some time, so I would recommend taking
the time necessary to make the best fix possible, even if it means
waiting
2017-02-05 7:54 GMT+01:00 Pavan Deolasee :
>
> On Sat, Feb 4, 2017 at 11:54 PM, Tom Lane wrote:
>
>>
>> Based on Pavan's comments, I think trying to force this into next week's
>> releases would be extremely unwise. If the bug went undetected this
On Sat, Feb 4, 2017 at 11:54 PM, Tom Lane wrote:
>
> Based on Pavan's comments, I think trying to force this into next week's
> releases would be extremely unwise. If the bug went undetected this long,
> it can wait for a fix for another three months.
Yes, I think bug
Alvaro Herrera writes:
> I intend to commit this soon to all branches, to ensure it gets into the
> next set of minors.
Based on Pavan's comments, I think trying to force this into next week's
releases would be extremely unwise. If the bug went undetected this long,
it
On Sat, Feb 4, 2017 at 12:10 PM, Amit Kapila
wrote:
>
>
> If we do above, then I think primary key attrs won't be returned
> because for those we are using relation copy rather than an original
> working copy of attrs. See code below:
>
> switch (attrKind)
> {
> ..
>
On Sat, Feb 4, 2017 at 12:12 AM, Alvaro Herrera
wrote:
> Pavan Deolasee wrote:
>
>> Looking at the history and some past discussions, it seems Tomas reported
>> somewhat similar problem and Andres proposed a patch here
>>
Pavan Deolasee wrote:
> Looking at the history and some past discussions, it seems Tomas reported
> somewhat similar problem and Andres proposed a patch here
> https://www.postgresql.org/message-id/20140514155204.ge23...@awork2.anarazel.de
> which got committed via b23b0f5588d2e2. Not exactly the
On Thu, Feb 2, 2017 at 10:14 PM, Alvaro Herrera
wrote:
>
>
> I'm going to study the bug a bit more, and put in some patch before the
> upcoming minor tag on Monday.
>
>
Looking at the history and some past discussions, it seems Tomas reported
somewhat similar problem
On Thu, Feb 2, 2017 at 6:14 PM, Amit Kapila wrote:
>
> /*
> + * If the index list was invalidated, we better also invalidate the index
> + * attribute list (which should automatically invalidate other attributes
> + * such as primary key and replica identity)
> + */
>
Pavan Deolasee wrote:
> I can reproduce this entire scenario using gdb sessions. This also explains
> why the patch I sent earlier helps to solve the problem.
Ouch. Great detective work there.
I think it's quite possible that this bug explains some index errors,
such as primary keys (or unique
On Mon, Jan 30, 2017 at 7:20 PM, Pavan Deolasee
wrote:
>
> Based on my investigation so far and the evidence that I've collected,
> what probably happens is that after a relcache invalidation arrives at the
> new transaction, it recomputes the rd_indexattr but based on
On Mon, Jan 30, 2017 at 7:20 PM, Pavan Deolasee
wrote:
> Hello All,
>
> While stress testing WARM, I encountered a data consistency problem. It
> happens when index is built concurrently. What I thought to be a WARM
> induced bug and it took me significant amount of
For the record, here are the results of our (ongoing) inevstigation into
the index/heap corruption problems I reported a couple of weeks ago.
We were able to trigger the problem with kernels 2.6.16, 2.6.17 and
2.6.18.rc1, with 2.6.16 seeming to be the most flaky.
By replacing the NFS-mounted
On 6/30/06, Tom Lane [EMAIL PROTECTED] wrote:
I don't know the kernel nearly well enough to guess if these are related
...
The sl_log_* tables are indexed on xid, where the relations between
values are not exactly stable. When having high enough activity on
one node or having nodes with XIDs
Marko Kreen [EMAIL PROTECTED] writes:
The sl_log_* tables are indexed on xid, where the relations between
values are not exactly stable. When having high enough activity on
one node or having nodes with XIDs on different enough positions
funny things happen.
Yeah, that was one of the first
I trawled through the first, larger dump you sent me, and found multiple
index entries pointing to quite a few heap tuples:
Occurrences block item
2 43961 1
2 43961 2
2 43961 3
2 43961 4
On 6/30/2006 9:55 AM, Tom Lane wrote:
Marko Kreen [EMAIL PROTECTED] writes:
The sl_log_* tables are indexed on xid, where the relations between
values are not exactly stable. When having high enough activity on
one node or having nodes with XIDs on different enough positions
funny things
On 6/30/06, Jan Wieck [EMAIL PROTECTED] wrote:
With the final implementation of log switching, the problem of xxid
wraparound will be avoided entirely. Every now and then slony will
switch from one to another log table and when the old one is drained and
logically empty, it is truncated, which
On 6/30/2006 11:17 AM, Marko Kreen wrote:
On 6/30/06, Jan Wieck [EMAIL PROTECTED] wrote:
With the final implementation of log switching, the problem of xxid
wraparound will be avoided entirely. Every now and then slony will
switch from one to another log table and when the old one is drained
Jan Wieck [EMAIL PROTECTED] writes:
On 6/30/2006 11:17 AM, Marko Kreen wrote:
If the xxid-s come from different DB-s, then there can still be problems.
How so? They are allways part of a multi-key index having the
originating node ID first.
Really?
create table @[EMAIL PROTECTED] (
Tom Lane wrote:
Marc Munro [EMAIL PROTECTED] writes:
I'll get back to you with kernel build information tomorrow. We'll also
try to talk to some kernel hackers about this.
Some googling turned up recent discussions about race conditions in
Linux NFS code:
On 6/30/2006 11:55 AM, Tom Lane wrote:
Jan Wieck [EMAIL PROTECTED] writes:
On 6/30/2006 11:17 AM, Marko Kreen wrote:
If the xxid-s come from different DB-s, then there can still be problems.
How so? They are allways part of a multi-key index having the
originating node ID first.
Really?
Brad Nicholson wrote:
Tom Lane wrote:
Marc Munro [EMAIL PROTECTED] writes:
I'll get back to you with kernel build information tomorrow. We'll also
try to talk to some kernel hackers about this.
Some googling turned up recent discussions about race conditions in
Linux NFS code:
Brad Nicholson [EMAIL PROTECTED] writes:
It may or may not be the same issue, but for what it's worth, we've seen
the same sl_log_1 corruption on AIX 5.1 and 5.3
Hm, on what filesystem, and what PG version(s)?
I'm not completely satisfied by the its-a-kernel-bug theory, because if
it were
Tom Lane wrote:
Brad Nicholson [EMAIL PROTECTED] writes:
It may or may not be the same issue, but for what it's worth, we've seen
the same sl_log_1 corruption on AIX 5.1 and 5.3
Hm, on what filesystem, and what PG version(s)?
I'm not completely satisfied by the its-a-kernel-bug theory,
Jan Wieck [EMAIL PROTECTED] writes:
You're right ... forgot about that one.
However, transactions from different origins are NEVER selected together
and it wouldn't make sense to compare their xid's anyway. So the index
might return index tuples for rows from another origin, but the
Ühel kenal päeval, R, 2006-06-30 kell 12:05, kirjutas Jan Wieck:
On 6/30/2006 11:55 AM, Tom Lane wrote:
Jan Wieck [EMAIL PROTECTED] writes:
On 6/30/2006 11:17 AM, Marko Kreen wrote:
If the xxid-s come from different DB-s, then there can still be problems.
How so? They are allways
On Thu, 2006-06-29 at 21:47 -0400, Tom Lane wrote:
One easy thing that would be worth trying is to build with
--enable-cassert and see if any Asserts get provoked during the
A couple other things to try, given that you can provoke the failure
fairly easily:
. . .
1. In studying the code, it
Marc Munro [EMAIL PROTECTED] writes:
We tried all of these suggestions and still get the problem. Nothing
interesting in the log file so I guess the Asserts did not fire.
Not surprising, it was a long shot that any of those things were really
broken. But worth testing.
We are going to try
On Tom Lane's advice, we upgraded to Postgres 8.0.8. We also upgraded
slony to 1.1.5, due to some rpm issues. Apart from that everything is
as described below.
We were able to corrupt the index within 90 minutes of starting up our
cluster. A slony-induced vacuum was under way on the provider
Marc Munro [EMAIL PROTECTED] writes:
On Tom Lane's advice, we upgraded to Postgres 8.0.8.
OK, so it's not an already-known problem.
We were able to corrupt the index within 90 minutes of starting up our
cluster. A slony-induced vacuum was under way on the provider at the
time the subscriber
On Thu, 2006-06-29 at 12:11 -0400, Tom Lane wrote:
OK, so it's not an already-known problem.
We were able to corrupt the index within 90 minutes of starting up our
cluster. A slony-induced vacuum was under way on the provider at the
time the subscriber failed.
You still haven't given
Marc Munro [EMAIL PROTECTED] writes:
As you see, slony is attempting to enter one tuple
('374520943','22007','0') two times.
Each previous time we have had this problem, rebuilding the indexes on
slony log table (sl_log_1) has fixed the problem. I have not reindexed
the table this time as I
We have reproduced the problem again. This time it looks like vacuum is
not part of the picture. From the provider's log:
2006-06-29 14:02:41 CST DEBUG2 cleanupThread: 101.057 seconds for vacuuming
And from the subscriber's:
2006-06-29 13:00:43 PDT ERROR remoteWorkerThread_1: insert into
[EMAIL PROTECTED] (Marc Munro) writes:
As you see, slony is attempting to enter one tuple
('374520943','22007','0') two times.
Each previous time we have had this problem, rebuilding the indexes on
slony log table (sl_log_1) has fixed the problem. I have not reindexed
the table this time as
Marc Munro [EMAIL PROTECTED] writes:
Tom,
we have a newer and much smaller (35M) file showing the same thing:
Thanks. Looking into this, what I find is that *both* indexes have
duplicated entries for the same heap tuple:
idx1:
Item 190 -- Length: 24 Offset: 3616 (0x0e20) Flags: USED
Ühel kenal päeval, N, 2006-06-29 kell 16:42, kirjutas Chris Browne:
[EMAIL PROTECTED] (Marc Munro) writes:
As you see, slony is attempting to enter one tuple
('374520943','22007','0') two times.
Each previous time we have had this problem, rebuilding the indexes on
slony log table
Ühel kenal päeval, N, 2006-06-29 kell 17:23, kirjutas Tom Lane:
Marc Munro [EMAIL PROTECTED] writes:
Tom,
we have a newer and much smaller (35M) file showing the same thing:
Thanks. Looking into this, what I find is that *both* indexes have
duplicated entries for the same heap tuple:
On Fri, 2006-06-30 at 00:37 +0300, Hannu Krosing wrote:
Marc: do you have triggers on some replicated tables ?
We have a non-slony trigger on only 2 tables, neither of them involved
in this transaction. We certainly have no circular trigger structures.
I remember having some corruption in a
I wrote:
What I speculate right at the moment is that we are not looking at index
corruption at all, but at heap corruption: somehow, the first insertion
into ctid (27806,2) got lost and the same ctid got re-used for the next
inserted row. We fixed one bug like this before ...
Further study
On Thu, 2006-06-29 at 19:59 -0400, Tom Lane wrote:
Ummm ... you did restart the server? select version(); would be
the definitive test.
Can't say I blame you for the skepticism but I have confirmed it again.
test=# select version();
version
Marc Munro [EMAIL PROTECTED] writes:
If there's anything we can do to help debug this we will. We can apply
patches, different build options, etc.
One easy thing that would be worth trying is to build with
--enable-cassert and see if any Asserts get provoked during the
failure case. I don't
[ back to the start of the thread... ]
Marc Munro [EMAIL PROTECTED] writes:
We have now experienced index corruption on two separate but identical
slony clusters. In each case the slony subscriber failed after
attempting to insert a duplicate record. In each case reindexing the
sl_log_1
On Thu, 2006-06-29 at 21:47 -0400, Tom Lane wrote:
One easy thing that would be worth trying is to build with
--enable-cassert and see if any Asserts get provoked during the
failure case. I don't have a lot of hope for that, but it's
something that would require only machine time not people
Marc Munro [EMAIL PROTECTED] writes:
By dike out, you mean remove? Please confirm and I'll try it.
Right, just remove (or comment out) the lines I quoted.
We ran this system happily for nearly a year on the
previous kernel without experiencing this problem (tcp lockups are a
different
On Thu, 2006-06-29 at 21:59 -0400, Tom Lane wrote:
[ back to the start of the thread... ]
BTW, a couple of thoughts here:
* If my theory about the low-level cause is correct, then reindexing
sl_log_1 would make the duplicate key errors go away, but nonetheless
you'd have lost data --- the
Marc Munro [EMAIL PROTECTED] writes:
I'll get back to you with kernel build information tomorrow. We'll also
try to talk to some kernel hackers about this.
Some googling turned up recent discussions about race conditions in
Linux NFS code:
Marc Munro [EMAIL PROTECTED] writes:
We have now experienced index corruption on two separate but identical
slony clusters. In each case the slony subscriber failed after
attempting to insert a duplicate record. In each case reindexing the
sl_log_1 table on the provider fixed the problem.
On Feb 13, 2003, Tom Lane wrote:
Laurette Cisneros [EMAIL PROTECTED] writes:
This is the error in the pgsql log:
2003-02-13 16:21:42 [8843] ERROR: Index external_signstops_pkey is
not a btree
This says that one of two fields that should never change, in fixed
positions in the first
On Monday March 31 2003 3:38, Ed L. wrote:
On Feb 13, 2003, Tom Lane wrote:
Laurette Cisneros [EMAIL PROTECTED] writes:
This is the error in the pgsql log:
2003-02-13 16:21:42 [8843] ERROR: Index external_signstops_pkey is
not a btree
This says that one of two fields that should
1 - 100 of 104 matches
Mail list logo