I recently raised "BUG #6425: Bus error in slot_deform_tuple". During the last
reproduction of the problem I saw this:
Client 2 aborted in state 0: ERROR: invalid memory alloc request size
18446744073709551613
So like Tom said, these two issues could well be related. I just wanted to
mention
I recently raised "BUG #6425: Bus error in slot_deform_tuple". During the last
reproduction of the problem I saw this:
Client 2 aborted in state 0: ERROR: invalid memory alloc request size
18446744073709551613
So like Tom said, these two issues could well be related. I just wanted to
mention
Excerpts from Tom Lane's message of mié feb 01 18:06:27 -0300 2012:
> Robert Haas writes:
> >>> No, I wasn't thinking about a tuple descriptor mismatch. I was
> >>> imagining that the page contents themselves might be in flux while
> >>> we're trying to read from it.
>
> > It would be nice to ge
Robert Haas writes:
>>> No, I wasn't thinking about a tuple descriptor mismatch. I was
>>> imagining that the page contents themselves might be in flux while
>>> we're trying to read from it.
> It would be nice to get a dump of what PostgreSQL thought the entire
> block looked like at the time t
On Wed, Feb 1, 2012 at 11:19 AM, Tom Lane wrote:
> Robert Haas writes:
>> No, I wasn't thinking about a tuple descriptor mismatch. I was
>> imagining that the page contents themselves might be in flux while
>> we're trying to read from it.
>
> Oh, gotcha. Yes, that's a horribly plausible idea.
Robert Haas writes:
> No, I wasn't thinking about a tuple descriptor mismatch. I was
> imagining that the page contents themselves might be in flux while
> we're trying to read from it.
Oh, gotcha. Yes, that's a horribly plausible idea. All it'd take is
one WAL replay routine that hasn't been
On Tue, Jan 31, 2012 at 4:25 PM, Tom Lane wrote:
> Robert Haas writes:
>> On Tue, Jan 31, 2012 at 12:05 AM, Tom Lane wrote:
>>> BTW, after a bit more reflection it occurs to me that it's not so much
>>> that the data is necessarily *bad*, as that it seemingly doesn't match
>>> the tuple descript
So here's a better stack trace for the segfault issue (again, just to
summarize, since this is a long thread, we're seeing two issues: 1) alloc
errors that do not crash the DB (although we modified postgres to panic
when this happens in our test environment, and posted a stack earlier) 2) a
postgre
Robert Haas writes:
> On Tue, Jan 31, 2012 at 12:05 AM, Tom Lane wrote:
>> BTW, after a bit more reflection it occurs to me that it's not so much
>> that the data is necessarily *bad*, as that it seemingly doesn't match
>> the tuple descriptor that the backend's trying to interpret it with.
> Hm
On Tue, Jan 31, 2012 at 12:05 AM, Tom Lane wrote:
> I wrote:
>> Hm. The stack trace is definitive that it's finding the bad data in a
>> tuple that it's trying to print to the client, not in an index.
>
> BTW, after a bit more reflection it occurs to me that it's not so much
> that the data is ne
Excerpts from Bridget Frey's message of lun ene 30 18:59:08 -0300 2012:
> Anyway, here goes...
Maybe a "bt full" could give more insight into what's going on ...
> #0 0x003a83e30265 in raise () from /lib64/libc.so.6
> #1 0x003a83e31d10 in abort () from /lib64/libc.so.6
> #2 0x000
We have no DDL whatsoever in the code. We do update rows in the
logins table frequently, but we basically have a policy of only doing
DDL changes during scheduled upgrades when we bring the site down. We
have been discussing this issue a lot and we really haven't come up
with anything that would
I wrote:
> Hm. The stack trace is definitive that it's finding the bad data in a
> tuple that it's trying to print to the client, not in an index.
BTW, after a bit more reflection it occurs to me that it's not so much
that the data is necessarily *bad*, as that it seemingly doesn't match
the tupl
Bridget Frey writes:
> Thanks for the reply, we appreciate you time on this. The alloc error
> queries all seem to be selects from a btree primary index. I gave an
> example in my initial post from the logins table. Usually for us it
> is logins but sometimes we have seen it on a few other tab
Hi Tom,
Thanks for the reply, we appreciate you time on this. The alloc error
queries all seem to be selects from a btree primary index. I gave an
example in my initial post from the logins table. Usually for us it
is logins but sometimes we have seen it on a few other tables, and
it's always a
Bridget Frey writes:
> The second error is an invalid memory alloc error that we're getting ~2
> dozen times per day in production. The bt for this alloc error is below.
This trace is consistent with the idea that we're getting a corrupt
tuple out of a table, although it doesn't entirely preclud
All right, so we were able to get a full bt of the alloc error on a test
system. Also, since we have a lot of emails going around on this - I
wanted to make it clear that we're seeing *two* production errors, which
may or may not be related. (The OP for bug #6200 also sees both issues.)
One is a
On Sat, Jan 28, 2012 at 8:45 PM, Michael Brauwerman
wrote:
> We did try that with a postgres 9.1.2, compiled from source with debug
> flags, but we got 0x10 bad address in gdb. (Obviously we did it wrong
> somehow)
>
> We will keep trying to get a good set of symbols set up.
Hmm. Your backtrace
We did try that with a postgres 9.1.2, compiled from source with debug
flags, but we got 0x10 bad address in gdb. (Obviously we did it wrong
somehow)
We will keep trying to get a good set of symbols set up.
On Jan 28, 2012 2:34 PM, "Peter Geoghegan" wrote:
> On 28 January 2012 21:34, Michael Bra
On 28 January 2012 21:34, Michael Brauwerman
wrote:
> We have the (5GB) core file, and are happy to do any more forensics anyone
> can advise.
Ideally, you'd be able to install debug information packages, which
should give a more detailed and useful stack trace, as described here:
http://wiki.po
I work with Bridget at Redfin.
We have a core dump from a once-in-5-days (multi-million queries) hot
standby segfault in pg 9.1.2 . (It might or might be the same root issue as
the "alloc" errors. If I should file a new bug report, let me know.
The postgres executable that crashed did not have de
Thanks for the info - that's very helpful. We had also noted that the
alloc seems to be -3 bytes. We have run pg_check and it found no instances
of corruption. We've also replayed queries that have failed, and have never
been able to get the same query to fail twice. In the case you
investigated
On Fri, Jan 27, 2012 at 1:31 PM, Bridget Frey wrote:
> Thanks for the info - that's very helpful. We had also noted that the alloc
> seems to be -3 bytes. We have run pg_check and it found no instances of
> corruption. We've also replayed queries that have failed, and have never
> been able to g
On Mon, Jan 23, 2012 at 3:22 PM, Bridget Frey wrote:
> Hello,
> We upgraded to postgres 9.1.2 two weeks ago, and we are also experiencing an
> issue that seems very similar to the one reported as bug 6200. We see
> approximately 2 dozen alloc errors per day across 3 slaves, and we are
> getting o
Hello,
We upgraded to postgres 9.1.2 two weeks ago, and we are also experiencing
an issue that seems very similar to the one reported as bug 6200. We see
approximately 2 dozen alloc errors per day across 3 slaves, and we are
getting one segfault approximately every 3 days. We did not experience
t
On Thu, Sep 8, 2011 at 11:33 PM, Daniel Farina wrote:
> ERROR: invalid memory alloc request size 18446744073709551613
> At least once, a hot standby was promoted to a primary and the errors seem
> to discontinue, but then reappear on a newly-provisioned standby.
So the query that fails is a bt
On 09.09.2011 18:02, Tom Lane wrote:
The way that I'd personally proceed to investigate it would probably be
to change the "invalid memory alloc request size" size errors (in
src/backend/utils/mmgr/mcxt.c; there are about four occurrences) from
ERROR to PANIC so that they'll provoke a core dump,
"Daniel Farina" writes:
> A huge thanks to Conrad Irwin of Rapportive for furnishing virtually all the
> details of this bug report.
This isn't really enough information to reproduce the problem ...
> The occurrence rate is somewhere in the one per tens-of-millions of
> queries.
... and that st
The following bug has been logged online:
Bug reference: 6200
Logged by: Daniel Farina
Email address: dan...@heroku.com
PostgreSQL version: 9.0.4
Operating system: Ubuntu 10.04
Description:standby bad memory allocations on SELECT
Details:
A huge thanks to Conrad Irw
29 matches
Mail list logo