On 2/19/17 5:27 AM, Robert Haas wrote:
(1) a multi-batch hash join, (2) a nested loop,
and (3) a merge join. (2) is easy to implement but will generate a
ton of random I/O if the table is not resident in RAM. (3) is most
suitable for very large tables but takes more work to code, and is
also
On Sun, Feb 19, 2017 at 3:52 PM, Pavan Deolasee
wrote:
> This particular case of corruption results in a heap tuple getting indexed
> by a wrong key (or to be precise, indexed by its old value). So the only way
> to detect the corruption is to look at each index key and
On Sun, Feb 19, 2017 at 3:43 PM, Robert Haas wrote:
> On Fri, Feb 17, 2017 at 11:15 PM, Tom Lane wrote:
>
> > Ah, nah, scratch that. If any post-index-build pruning had occurred on a
> > page, the evidence would be gone --- the non-matching older
On Fri, Feb 17, 2017 at 11:15 PM, Tom Lane wrote:
> I wrote:
>> However, you might be able to find it without so much random I/O.
>> I'm envisioning a seqscan over the table, in which you simply look for
>> HOT chains in which the indexed columns aren't all the same. When you
On Fri, Feb 17, 2017 at 9:31 AM, Tom Lane wrote:
> This seems like it'd be quite a different tool than amcheck, though.
> Also, it would only find broken-HOT-chain corruption, which might be
> a rare enough issue to not deserve a single-purpose tool.
FWIW, my ambition for
I wrote:
> However, you might be able to find it without so much random I/O.
> I'm envisioning a seqscan over the table, in which you simply look for
> HOT chains in which the indexed columns aren't all the same. When you
> find one, you'd have to do a pretty expensive index lookup to confirm
>
Peter Geoghegan writes:
> The difference with a test that could detect this variety of
> corruption is that that would need to visit the heap, which tends to
> be much larger than any one index, or even all indexes. That would
> probably need to be random I/O, too. It might be
On Fri, Feb 17, 2017 at 8:23 AM, Keith Fiske wrote:
> It's not the load I'm worried about, it's the locks that are required at
> some point during the rebuild. Doing an index rebuild here and there isn't a
> big deal, but trying to do it for an entire heavily loaded,
Keith Fiske wrote:
> I can understandable if it's simply not possible, but if it is, I think in
> any cases of data corruption, having some means to check for it to be sure
> you're in the clear would be useful.
Maybe it is possible. I just didn't try, since it didn't seem very
useful.
--
On Fri, Feb 17, 2017 at 11:12 AM, Alvaro Herrera
wrote:
> Keith Fiske wrote:
>
> > Was just curious if anyone was able to come up with any sort of method to
> > test whether an index was corrupted by this bug, other than just waiting
> > for bad query results? We've
Keith Fiske wrote:
> Was just curious if anyone was able to come up with any sort of method to
> test whether an index was corrupted by this bug, other than just waiting
> for bad query results? We've used concurrent index rebuilding quite
> extensively over the years to remove bloat from busy
On Mon, Feb 6, 2017 at 10:17 PM, Amit Kapila
wrote:
> On Mon, Feb 6, 2017 at 10:28 PM, Tom Lane wrote:
> > Amit Kapila writes:
> >> Hmm. Consider that the first time relcahe invalidation occurs while
> >> computing
On Mon, Feb 6, 2017 at 10:28 PM, Tom Lane wrote:
> Amit Kapila writes:
>> Hmm. Consider that the first time relcahe invalidation occurs while
>> computing id_attrs, so now the retry logic will compute the correct
>> set of attrs (considering two
On Mon, Feb 6, 2017 at 11:54 PM, Tom Lane wrote:
> After some discussion among the release team, we've concluded that the
> best thing to do is to push Pavan's/my patch into today's releases.
> This does not close the matter by any means: we should continue to
> study whether
After some discussion among the release team, we've concluded that the
best thing to do is to push Pavan's/my patch into today's releases.
This does not close the matter by any means: we should continue to
study whether there are related bugs or whether there's a more principled
way of fixing this
On Sun, Feb 5, 2017 at 9:42 PM, Pavan Deolasee wrote:
> On Mon, Feb 6, 2017 at 5:41 AM, Peter Geoghegan wrote:
>> On Sun, Feb 5, 2017 at 4:09 PM, Robert Haas wrote:
>> > I don't think this kind of black-and-white thinking is very
Alvaro Herrera writes:
> Tom Lane wrote:
>> Better to fix the callers so that they don't have the assumption you
>> refer to. Or maybe we could adjust the API of RelationGetIndexAttrBitmap
>> so that it returns all the sets needed by a given calling module at
>> once,
Tom Lane wrote:
> Better to fix the callers so that they don't have the assumption you
> refer to. Or maybe we could adjust the API of RelationGetIndexAttrBitmap
> so that it returns all the sets needed by a given calling module at
> once, which would allow us to guarantee they're consistent.
Amit Kapila writes:
> Hmm. Consider that the first time relcahe invalidation occurs while
> computing id_attrs, so now the retry logic will compute the correct
> set of attrs (considering two indexes, if we take the example given by
> Alvaro above.). However, the attrs
Andres Freund wrote:
> To show what I mean here's an *unpolished* and *barely tested* patch
> implementing the first of my suggestions.
>
> Alvaro, Pavan, I think should address the issue as well?
Hmm, interesting idea. Maybe a patch along these lines is a good way to
make index-list cache
Tom Lane wrote:
> Pavan Deolasee writes:
> > 2. In the second patch, we tried to recompute attribute lists if a relcache
> > flush happens in between and index list is invalidated. We've seen problems
> > with that, especially it getting into an infinite loop with
> >
El 05/02/17 a las 21:57, Tomas Vondra escribió:
>
> +1 to not rushing fixes into releases. While I think we now finally
> understand the mechanics of this bug, the fact that we came up with
> three different fixes in this thread, only to discover issues with each
> of them, warrants some caution.
> 6 февр. 2017 г., в 4:57, Peter Geoghegan написал(а):
>
> I meant that I find the fact that there were no user reports in all
> these years to be a good reason to not proceed for now in this
> instance.
Well, we had some strange situations with indexes (see below for example)
On Mon, Feb 6, 2017 at 9:47 AM, Pavan Deolasee wrote:
>
>
> On Mon, Feb 6, 2017 at 9:41 AM, Amit Kapila wrote:
>>
>>
>>
>> Hmm. Consider that the first time relcahe invalidation occurs while
>> computing id_attrs, so now the retry logic will
On Mon, Feb 6, 2017 at 8:01 AM, Pavan Deolasee
wrote:
>
>>
> I like this approach. I'll run the patch on a build with
> CACHE_CLOBBER_ALWAYS, but I'm pretty sure it will be ok.
>
While it looks certain that the fix will miss this point release, FWIW I
ran this patch
On Mon, Feb 6, 2017 at 9:41 AM, Amit Kapila wrote:
>
>
> Hmm. Consider that the first time relcahe invalidation occurs while
> computing id_attrs, so now the retry logic will compute the correct
> set of attrs (considering two indexes, if we take the example given by
>
On Mon, Feb 6, 2017 at 8:35 AM, Pavan Deolasee wrote:
>
>
> On Mon, Feb 6, 2017 at 8:15 AM, Amit Kapila wrote:
>>
>> On Mon, Feb 6, 2017 at 8:01 AM, Pavan Deolasee
>> wrote:
>> >
>> >
>> > On Mon, Feb 6, 2017 at 1:44
On 2017-02-05 22:34:34 -0500, Tom Lane wrote:
> Pavan Deolasee writes:
> The point is that there's a nontrivial chance of a hasty fix introducing
> worse problems than we fix.
>
> Given the lack of consensus about exactly how to fix this, I'm feeling
> like it's a good
Pavan Deolasee writes:
> On Mon, Feb 6, 2017 at 5:44 AM, Andres Freund wrote:
>> +1. I don't think we serve our users by putting out a nontrivial bugfix
>> hastily. Nor do I think we serve them in this instance by delaying the
>> release to cover
On 2017-02-05 21:49:57 -0500, Tom Lane wrote:
> Andres Freund writes:
> > I've not yet read the full thread, but I'm a bit confused so far. We
> > obviously can get changing information about indexes here, but isn't
> > that something we have to deal with anyway? If we
On Mon, Feb 6, 2017 at 5:44 AM, Andres Freund wrote:
> On 2017-02-05 12:51:09 -0500, Tom Lane wrote:
> > Michael Paquier writes:
> > > On Sun, Feb 5, 2017 at 6:53 PM, Pavel Stehule
> wrote:
> > >> I agree with Pavan - a
On Mon, Feb 6, 2017 at 8:15 AM, Amit Kapila wrote:
> On Mon, Feb 6, 2017 at 8:01 AM, Pavan Deolasee
> wrote:
> >
> >
> > On Mon, Feb 6, 2017 at 1:44 AM, Tom Lane wrote:
> >>
> >>
> >>
> >> > 2. In the second patch, we tried
Andres Freund writes:
> I've not yet read the full thread, but I'm a bit confused so far. We
> obviously can get changing information about indexes here, but isn't
> that something we have to deal with anyway? If we guarantee that we
> don't loose knowledge that there's a
On 2017-02-06 08:08:01 +0530, Amit Kapila wrote:
> I don't see in your patch that you are setting rd_bitmapsvalid to 0.
IIRC a plain relcache rebuild should do that (note there's also no place
that directly resets rd_indexattrs).
> Also, I think this suffers from the same problem as the patch
On Sun, Feb 5, 2017 at 6:42 PM, Pavan Deolasee wrote:
> I'm not sure that just because the bug wasn't reported by a user, makes it
> any less critical. As Tomas pointed down thread, the nature of the bug is
> such that the users may not discover it very easily, but that
On Mon, Feb 6, 2017 at 8:01 AM, Pavan Deolasee wrote:
>
>
> On Mon, Feb 6, 2017 at 1:44 AM, Tom Lane wrote:
>>
>>
>>
>> > 2. In the second patch, we tried to recompute attribute lists if a
>> > relcache
>> > flush happens in between and index list is
On Mon, Feb 6, 2017 at 5:41 AM, Peter Geoghegan wrote:
> On Sun, Feb 5, 2017 at 4:09 PM, Robert Haas wrote:
> > I don't think this kind of black-and-white thinking is very helpful.
> > Obviously, data corruption is bad. However, this bug has (from what
> >
On Mon, Feb 6, 2017 at 6:27 AM, Andres Freund wrote:
> Hi,
>
> On 2017-02-05 16:37:33 -0800, Andres Freund wrote:
>> > RelationGetIndexList(Relation relation)
>> > @@ -4746,8 +4747,10 @@ RelationGetIndexPredicate(Relation relat
>> > * we can include system attributes (e.g.,
On Mon, Feb 6, 2017 at 1:44 AM, Tom Lane wrote:
>
>
> > 2. In the second patch, we tried to recompute attribute lists if a
> relcache
> > flush happens in between and index list is invalidated. We've seen
> problems
> > with that, especially it getting into an infinite loop
On Sun, Feb 5, 2017 at 4:57 PM, Tomas Vondra
wrote:
> OTOH I disagree with the notion that bugs that are not driven by user
> reports are somehow less severe. Some data corruption bugs cause quite
> visible breakage - segfaults, immediate crashes, etc. Those are
Hi,
On 2017-02-05 16:37:33 -0800, Andres Freund wrote:
> > RelationGetIndexList(Relation relation)
> > @@ -4746,8 +4747,10 @@ RelationGetIndexPredicate(Relation relat
> > * we can include system attributes (e.g., OID) in the bitmap
> > representation.
> > *
> > * Caller had better hold at
On 02/06/2017 01:11 AM, Peter Geoghegan wrote:
On Sun, Feb 5, 2017 at 4:09 PM, Robert Haas wrote:
I don't think this kind of black-and-white thinking is very
helpful. Obviously, data corruption is bad. However, this bug has
(from what one can tell from this thread) been
On 2017-02-05 15:14:59 -0500, Tom Lane wrote:
> I do not like any of the other patches proposed in this thread, because
> they fail to guarantee delivering an up-to-date attribute bitmap to the
> caller. I think we need a retry loop, and I think that it needs to look
> like the attached.
That
On 2017-02-05 12:51:09 -0500, Tom Lane wrote:
> Michael Paquier writes:
> > On Sun, Feb 5, 2017 at 6:53 PM, Pavel Stehule
> > wrote:
> >> I agree with Pavan - a release with known important bug is not good idea.
>
> > This bug has been around
On Sun, Feb 5, 2017 at 4:09 PM, Robert Haas wrote:
> I don't think this kind of black-and-white thinking is very helpful.
> Obviously, data corruption is bad. However, this bug has (from what
> one can tell from this thread) been with us for over a decade; it must
>
On Sun, Feb 5, 2017 at 1:34 PM, Martín Marqués wrote:
> El 05/02/17 a las 10:03, Michael Paquier escribió:
>> On Sun, Feb 5, 2017 at 6:53 PM, Pavel Stehule
>> wrote:
>>> I agree with Pavan - a release with known important bug is not good idea.
>>
[ Having now read the whole thread, I'm prepared to weigh in ... ]
Pavan Deolasee writes:
> This seems like a real problem to me. I don't know what the consequences
> are, but definitely having various attribute lists to have different view
> of the set of indexes
El 05/02/17 a las 10:03, Michael Paquier escribió:
> On Sun, Feb 5, 2017 at 6:53 PM, Pavel Stehule wrote:
>> I agree with Pavan - a release with known important bug is not good idea.
>
> This bug has been around for some time, so I would recommend taking
> the time
2017-02-05 18:51 GMT+01:00 Tom Lane :
> Michael Paquier writes:
> > On Sun, Feb 5, 2017 at 6:53 PM, Pavel Stehule
> wrote:
> >> I agree with Pavan - a release with known important bug is not good
> idea.
>
> > This bug has
Michael Paquier writes:
> On Sun, Feb 5, 2017 at 6:53 PM, Pavel Stehule wrote:
>> I agree with Pavan - a release with known important bug is not good idea.
> This bug has been around for some time, so I would recommend taking
> the time
On Sun, Feb 5, 2017 at 6:53 PM, Pavel Stehule wrote:
> I agree with Pavan - a release with known important bug is not good idea.
This bug has been around for some time, so I would recommend taking
the time necessary to make the best fix possible, even if it means
waiting
2017-02-05 7:54 GMT+01:00 Pavan Deolasee :
>
> On Sat, Feb 4, 2017 at 11:54 PM, Tom Lane wrote:
>
>>
>> Based on Pavan's comments, I think trying to force this into next week's
>> releases would be extremely unwise. If the bug went undetected this
On Sat, Feb 4, 2017 at 11:54 PM, Tom Lane wrote:
>
> Based on Pavan's comments, I think trying to force this into next week's
> releases would be extremely unwise. If the bug went undetected this long,
> it can wait for a fix for another three months.
Yes, I think bug
Alvaro Herrera writes:
> I intend to commit this soon to all branches, to ensure it gets into the
> next set of minors.
Based on Pavan's comments, I think trying to force this into next week's
releases would be extremely unwise. If the bug went undetected this long,
it
On Sat, Feb 4, 2017 at 12:10 PM, Amit Kapila
wrote:
>
>
> If we do above, then I think primary key attrs won't be returned
> because for those we are using relation copy rather than an original
> working copy of attrs. See code below:
>
> switch (attrKind)
> {
> ..
>
On Sat, Feb 4, 2017 at 12:12 AM, Alvaro Herrera
wrote:
> Pavan Deolasee wrote:
>
>> Looking at the history and some past discussions, it seems Tomas reported
>> somewhat similar problem and Andres proposed a patch here
>>
Pavan Deolasee wrote:
> Looking at the history and some past discussions, it seems Tomas reported
> somewhat similar problem and Andres proposed a patch here
> https://www.postgresql.org/message-id/20140514155204.ge23...@awork2.anarazel.de
> which got committed via b23b0f5588d2e2. Not exactly the
On Thu, Feb 2, 2017 at 10:14 PM, Alvaro Herrera
wrote:
>
>
> I'm going to study the bug a bit more, and put in some patch before the
> upcoming minor tag on Monday.
>
>
Looking at the history and some past discussions, it seems Tomas reported
somewhat similar problem
On Thu, Feb 2, 2017 at 6:14 PM, Amit Kapila wrote:
>
> /*
> + * If the index list was invalidated, we better also invalidate the index
> + * attribute list (which should automatically invalidate other attributes
> + * such as primary key and replica identity)
> + */
>
Pavan Deolasee wrote:
> I can reproduce this entire scenario using gdb sessions. This also explains
> why the patch I sent earlier helps to solve the problem.
Ouch. Great detective work there.
I think it's quite possible that this bug explains some index errors,
such as primary keys (or unique
On Mon, Jan 30, 2017 at 7:20 PM, Pavan Deolasee
wrote:
>
> Based on my investigation so far and the evidence that I've collected,
> what probably happens is that after a relcache invalidation arrives at the
> new transaction, it recomputes the rd_indexattr but based on
On Mon, Jan 30, 2017 at 7:20 PM, Pavan Deolasee
wrote:
> Hello All,
>
> While stress testing WARM, I encountered a data consistency problem. It
> happens when index is built concurrently. What I thought to be a WARM
> induced bug and it took me significant amount of
Hello All,
While stress testing WARM, I encountered a data consistency problem. It
happens when index is built concurrently. What I thought to be a WARM
induced bug and it took me significant amount of investigation to finally
conclude that it's a bug in the master. In fact, we tested all the way
63 matches
Mail list logo