Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

Robert Haas Fri, 27 Jan 2012 06:32:15 -0800

On Mon, Jan 23, 2012 at 3:22 PM, Bridget Frey <bridget.f...@redfin.com> wrote:
> Hello,
> We upgraded to postgres 9.1.2 two weeks ago, and we are also experiencing an
> issue that seems very similar to the one reported as bug 6200.  We see
> approximately 2 dozen alloc errors per day across 3 slaves, and we are
> getting one segfault approximately every 3 days.  We did not experience this
> issue before our upgrade (we were on version 8.4, and used skytools for
> replication).
>
> We are attempting to get a core dump on segfault (our last attempt did not
> work due to a config issue for the core dump).  We're also attempting to
> repro the alloc errors on a test setup, but it seems like we may need quite
> a bit of load to trigger the issue.  We're not certain that the alloc issues
> and the sefaults are "the same issue" - but it seems that it may be since
> the OP for bug 6200 sees the same behavior.  We have seen no issues on the
> master, all alloc errors and segfaults have been on the slaves.
>
> We've seen the alloc errors on a few different tables, but most frequently
> on logins.  Rows are added to the logins table one-by-one, and updates
> generally happen one row at a time.  The table is pretty basic, it looks
> like this...
>
> CREATE TABLE logins
> (
>   login_id bigserial NOT NULL,
>   <snip - a bunch of columns>
>   CONSTRAINT logins_pkey PRIMARY KEY (login_id ),
>   <snip - some other constraints...>
> )
> WITH (
>   FILLFACTOR=80,
>   OIDS=FALSE
> );
>
> The queries that trigger the alloc error on this table look like this (we
> use hibernate hence the funny underscoring...)
> select login0_.login_id as login1_468_0_, l...  from logins login0_ where
> login0_.login_id=$1
>
> The alloc error in the logs looks like this:
> -01-12_080925.log:2012-01-12 17:33:46 PST [16034]: [7-1] [24/25934] ERROR:
> invalid memory alloc request size 18446744073709551613
>
> The alloc error is nearly always for size 18446744073709551613 - though we
> have seen one time where it was a different amount...


Hmm, that number in hex works out to 0xfffffffffffffffd, which makes
it sound an awful lot like the system (for some unknown reason)
attempted to allocate -3 bytes of memory.  I've seen something like
this once before on a customer system running a modified version of
PostgreSQL.  In that case, the problem turned out to be page
corruption.  Circumstances didn't permit determination of the root
cause of the page corruption, however, nor was I able to figure out
exactly how the corruption I saw resulted in an allocation request
like this.  It would be nice to figure out where in the code this is
happening and put in a higher-level guard so that we get a better
error message.

You want want to compile a modified PostgreSQL executable that puts an
extremely long sleep (like a year) just before this error is reported.
 Then, when the system hangs at that point, you can attach a debugger
and pull a stack backtrace.  Or you could insert an abort() at that
point in the code and get a backtrace from the core dump.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

Reply via email to