Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c

2011-04-09 Thread Brendan Jurd
On 9 April 2011 00:41, Alvaro Herrera alvhe...@commandprompt.com wrote:
 Excerpts from Brendan Jurd's message of vie abr 08 06:00:22 -0300 2011:
 Memtest didn't report any errors.  I intend to try swapping out the
 RAM tomorrow, but in the meantime we got a *different* assertion
 failure today.  The fact that we are tripping over various different
 assertions seems to lend further weight to the flaky hardware
 hypothesis.

 TRAP: FailedAssertion(!(((lpp)-lp_flags == 1)), File: heapam.c, Line: 
 727)

 Yep.


I swapped the RAM with another machine, and after a few hours running
the other machine popped a segfault.  The faulty RAM diagnosis is now
official, so I won't be bothering you folks about this any further.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c

2011-04-08 Thread Brendan Jurd
On 7 April 2011 16:56, Tom Lane t...@sss.pgh.pa.us wrote:
 Brendan Jurd dire...@gmail.com writes:
 TRAP: FailedAssertion(!((data - start) == data_size), File:
 heaptuple.c, Line: 255)

 [ scratches head ... ]  That implies that heap_fill_tuple came to a
 different conclusion about a tuple's data size than the immediately
 preceding heap_compute_data_size.  Which I would sure want to believe
 is impossible.  Have you checked for flaky memory on this machine?


Memtest didn't report any errors.  I intend to try swapping out the
RAM tomorrow, but in the meantime we got a *different* assertion
failure today.  The fact that we are tripping over various different
assertions seems to lend further weight to the flaky hardware
hypothesis.

TRAP: FailedAssertion(!(((lpp)-lp_flags == 1)), File: heapam.c, Line: 727)

#0  0x7f2773f23a75 in *__GI_raise (sig=value optimised out)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x7f2773f275c0 in *__GI_abort () at abort.c:92
#2  0x006f9eed in ExceptionalCondition (conditionName=value
optimised out,
errorType=value optimised out, fileName=value optimised out,
lineNumber=value optimised out) at assert.c:57
#3  0x00473641 in heapgettup_pagemode (scan=0x2366da8,
dir=value optimised out,
nkeys=value optimised out, key=value optimised out) at heapam.c:727
#4  0x00474b16 in heap_getnext (scan=0x2366da8,
direction=1495) at heapam.c:1322
#5  0x00590fcb in SeqNext (node=value optimised out) at
nodeSeqscan.c:66
#6  0x005808ff in ExecScanFetch (node=0x22d5ff8,
accessMtd=value optimised out,
recheckMtd=value optimised out) at execScan.c:82
#7  ExecScan (node=0x22d5ff8, accessMtd=value optimised out,
recheckMtd=value optimised out) at execScan.c:164
#8  0x00578d58 in ExecProcNode (node=0x22d5ff8) at execProcnode.c:378
#9  0x0058abf7 in ExecHashJoinOuterGetTuple (node=0x22d4a60)
at nodeHashjoin.c:562
#10 ExecHashJoin (node=0x22d4a60) at nodeHashjoin.c:187
#11 0x00578ca8 in ExecProcNode (node=0x22d4a60) at execProcnode.c:427
#12 0x0058abf7 in ExecHashJoinOuterGetTuple (node=0x22d3430)
at nodeHashjoin.c:562
#13 ExecHashJoin (node=0x22d3430) at nodeHashjoin.c:187
#14 0x00578ca8 in ExecProcNode (node=0x22d3430) at execProcnode.c:427
#15 0x00590021 in ExecNestLoop (node=0x22d26d8) at nodeNestloop.c:120
#16 0x00578cc8 in ExecProcNode (node=0x22d26d8) at execProcnode.c:419
#17 0x00590021 in ExecNestLoop (node=0x22c0c88) at nodeNestloop.c:120
#18 0x00578cc8 in ExecProcNode (node=0x22c0c88) at execProcnode.c:419
#19 0x00591bf9 in ExecSort (node=0x22c0a50) at nodeSort.c:102
#20 0x00578c88 in ExecProcNode (node=0x22c0a50) at execProcnode.c:438
#21 0x0057795e in ExecutePlan (queryDesc=0x23151f0,
direction=1495, count=0)
at execMain.c:1187
#22 standard_ExecutorRun (queryDesc=0x23151f0, direction=1495,
count=0) at execMain.c:280
#23 0x00643d67 in PortalRunSelect (portal=0x229bf78,
forward=value optimised out, count=0, dest=0x218a120) at pquery.c:952
#24 0x00645210 in PortalRun (portal=value optimised out,
count=value optimised out, isTopLevel=value optimised out,
dest=value optimised out, altdest=value optimised out,
completionTag=value optimised out) at pquery.c:796
#25 0x006428dc in exec_execute_message (argc=value optimised out,
argv=value optimised out, username=value optimised out) at
postgres.c:2003
#26 PostgresMain (argc=value optimised out, argv=value optimised out,
username=value optimised out) at postgres.c:3988
#27 0x00607351 in BackendRun () at postmaster.c:3555
#28 BackendStartup () at postmaster.c:3242
#29 ServerLoop () at postmaster.c:1431
#30 0x00609c6d in PostmasterMain (argc=35406528, argv=0x2185160)
at postmaster.c:1092
#31 0x005a99a0 in main (argc=5, argv=0x2185140) at main.c:188

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c

2011-04-08 Thread Alvaro Herrera
Excerpts from Brendan Jurd's message of vie abr 08 06:00:22 -0300 2011:

 Memtest didn't report any errors.  I intend to try swapping out the
 RAM tomorrow, but in the meantime we got a *different* assertion
 failure today.  The fact that we are tripping over various different
 assertions seems to lend further weight to the flaky hardware
 hypothesis.
 
 TRAP: FailedAssertion(!(((lpp)-lp_flags == 1)), File: heapam.c, Line: 
 727)

Yep.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c

2011-04-07 Thread Tom Lane
Brendan Jurd dire...@gmail.com writes:
 TRAP: FailedAssertion(!((data - start) == data_size), File:
 heaptuple.c, Line: 255)

[ scratches head ... ]  That implies that heap_fill_tuple came to a
different conclusion about a tuple's data size than the immediately
preceding heap_compute_data_size.  Which I would sure want to believe
is impossible.  Have you checked for flaky memory on this machine?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c

2011-04-07 Thread Brendan Jurd
On 7 April 2011 16:56, Tom Lane t...@sss.pgh.pa.us wrote:
 Brendan Jurd dire...@gmail.com writes:
 TRAP: FailedAssertion(!((data - start) == data_size), File:
 heaptuple.c, Line: 255)

 [ scratches head ... ]  That implies that heap_fill_tuple came to a
 different conclusion about a tuple's data size than the immediately
 preceding heap_compute_data_size.  Which I would sure want to believe
 is impossible.  Have you checked for flaky memory on this machine?


We are doing so now -- although the RAM is ECC and just a few months
old, so flakiness seems a distant possibility.  I will report back
after we've given it a proper thrashing with memtest.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c

2011-04-07 Thread Craig Ringer

On 04/07/2011 03:07 PM, Brendan Jurd wrote:

On 7 April 2011 16:56, Tom Lanet...@sss.pgh.pa.us  wrote:

Brendan Jurddire...@gmail.com  writes:

TRAP: FailedAssertion(!((data - start) == data_size), File:
heaptuple.c, Line: 255)


[ scratches head ... ]  That implies that heap_fill_tuple came to a
different conclusion about a tuple's data size than the immediately
preceding heap_compute_data_size.  Which I would sure want to believe
is impossible.  Have you checked for flaky memory on this machine?



We are doing so now -- although the RAM is ECC and just a few months
old, so flakiness seems a distant possibility.  I will report back
after we've given it a proper thrashing with memtest.


Apparently bad RAM can also mean faulty CPU (bad cache, heat problems, 
etc). memtest86 seems ... rough ... at best when it comes to finding 
issues; I've had some systems run it for a day yet continuously segfault 
in real-world use until the RAM was re-seated or swapped out.


--
Craig Ringer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c

2011-04-07 Thread Alvaro Herrera
Excerpts from Brendan Jurd's message of jue abr 07 03:07:32 -0300 2011:
 Hi folks,
 
 I am running a 9.0.3 Hot Standy + Streaming Replication slave which
 occasionally segfaults (every 1-2 days).  I rebuilt Postgres with
 --enable-cassert and --enable-debug, switched on core dumping and
 waited for some results.

What's the platform, and what's the query?  Are there funny datatypes
involved?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c

2011-04-07 Thread Brendan Jurd
On 8 April 2011 00:16, Alvaro Herrera alvhe...@commandprompt.com wrote:
 Excerpts from Brendan Jurd's message of jue abr 07 03:07:32 -0300 2011:
 I am running a 9.0.3 Hot Standy + Streaming Replication slave which
 occasionally segfaults (every 1-2 days).  I rebuilt Postgres with
 --enable-cassert and --enable-debug, switched on core dumping and
 waited for some results.

 What's the platform, and what's the query?  Are there funny datatypes
 involved?

Ubuntu 10.04 x64 on:

HP DL380R05
1x Quad Core Xeon E5440
10GB PC 5400 DDR ECC
2x HP 146GB 15krpm SAS drives in RAID 1+0

The tomcat instance repeatedly runs a series of some 9 queries, I'm
not sure which of the queries is the culprit or even whether it is the
same one each time.  However, they are all straightforward SELECTs.
The one with the most complicated plan joins a whole six tables.  I do
keep the transaction open until I have executed all the SELECTs in the
series, then commit and start over again with a fresh transaction.
That's just to make sure all of the queries are pulling data from the
same snapshot.

As for datatypes, I do have one type that I have defined which is used
in one of the queries.  It's just an RGB colour value, defined as a
composite type:

CREATE DOMAIN colour_channel AS smallint
CHECK (VALUE = 0 AND VALUE  256);

CREATE TYPE rgb AS (
red colour_channel,
green   colour_channel,
bluecolour_channel
);

All of the user-defined functions I have written for this db are
either SQL or PL/pgSQL, and all of the functions called by these
queries are either STABLE or IMMUTABLE.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers