Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c
On 9 April 2011 00:41, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Brendan Jurd's message of vie abr 08 06:00:22 -0300 2011: Memtest didn't report any errors. I intend to try swapping out the RAM tomorrow, but in the meantime we got a *different* assertion failure today. The fact that we are tripping over various different assertions seems to lend further weight to the flaky hardware hypothesis. TRAP: FailedAssertion(!(((lpp)-lp_flags == 1)), File: heapam.c, Line: 727) Yep. I swapped the RAM with another machine, and after a few hours running the other machine popped a segfault. The faulty RAM diagnosis is now official, so I won't be bothering you folks about this any further. Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c
On 7 April 2011 16:56, Tom Lane t...@sss.pgh.pa.us wrote: Brendan Jurd dire...@gmail.com writes: TRAP: FailedAssertion(!((data - start) == data_size), File: heaptuple.c, Line: 255) [ scratches head ... ] That implies that heap_fill_tuple came to a different conclusion about a tuple's data size than the immediately preceding heap_compute_data_size. Which I would sure want to believe is impossible. Have you checked for flaky memory on this machine? Memtest didn't report any errors. I intend to try swapping out the RAM tomorrow, but in the meantime we got a *different* assertion failure today. The fact that we are tripping over various different assertions seems to lend further weight to the flaky hardware hypothesis. TRAP: FailedAssertion(!(((lpp)-lp_flags == 1)), File: heapam.c, Line: 727) #0 0x7f2773f23a75 in *__GI_raise (sig=value optimised out) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x7f2773f275c0 in *__GI_abort () at abort.c:92 #2 0x006f9eed in ExceptionalCondition (conditionName=value optimised out, errorType=value optimised out, fileName=value optimised out, lineNumber=value optimised out) at assert.c:57 #3 0x00473641 in heapgettup_pagemode (scan=0x2366da8, dir=value optimised out, nkeys=value optimised out, key=value optimised out) at heapam.c:727 #4 0x00474b16 in heap_getnext (scan=0x2366da8, direction=1495) at heapam.c:1322 #5 0x00590fcb in SeqNext (node=value optimised out) at nodeSeqscan.c:66 #6 0x005808ff in ExecScanFetch (node=0x22d5ff8, accessMtd=value optimised out, recheckMtd=value optimised out) at execScan.c:82 #7 ExecScan (node=0x22d5ff8, accessMtd=value optimised out, recheckMtd=value optimised out) at execScan.c:164 #8 0x00578d58 in ExecProcNode (node=0x22d5ff8) at execProcnode.c:378 #9 0x0058abf7 in ExecHashJoinOuterGetTuple (node=0x22d4a60) at nodeHashjoin.c:562 #10 ExecHashJoin (node=0x22d4a60) at nodeHashjoin.c:187 #11 0x00578ca8 in ExecProcNode (node=0x22d4a60) at execProcnode.c:427 #12 0x0058abf7 in ExecHashJoinOuterGetTuple (node=0x22d3430) at nodeHashjoin.c:562 #13 ExecHashJoin (node=0x22d3430) at nodeHashjoin.c:187 #14 0x00578ca8 in ExecProcNode (node=0x22d3430) at execProcnode.c:427 #15 0x00590021 in ExecNestLoop (node=0x22d26d8) at nodeNestloop.c:120 #16 0x00578cc8 in ExecProcNode (node=0x22d26d8) at execProcnode.c:419 #17 0x00590021 in ExecNestLoop (node=0x22c0c88) at nodeNestloop.c:120 #18 0x00578cc8 in ExecProcNode (node=0x22c0c88) at execProcnode.c:419 #19 0x00591bf9 in ExecSort (node=0x22c0a50) at nodeSort.c:102 #20 0x00578c88 in ExecProcNode (node=0x22c0a50) at execProcnode.c:438 #21 0x0057795e in ExecutePlan (queryDesc=0x23151f0, direction=1495, count=0) at execMain.c:1187 #22 standard_ExecutorRun (queryDesc=0x23151f0, direction=1495, count=0) at execMain.c:280 #23 0x00643d67 in PortalRunSelect (portal=0x229bf78, forward=value optimised out, count=0, dest=0x218a120) at pquery.c:952 #24 0x00645210 in PortalRun (portal=value optimised out, count=value optimised out, isTopLevel=value optimised out, dest=value optimised out, altdest=value optimised out, completionTag=value optimised out) at pquery.c:796 #25 0x006428dc in exec_execute_message (argc=value optimised out, argv=value optimised out, username=value optimised out) at postgres.c:2003 #26 PostgresMain (argc=value optimised out, argv=value optimised out, username=value optimised out) at postgres.c:3988 #27 0x00607351 in BackendRun () at postmaster.c:3555 #28 BackendStartup () at postmaster.c:3242 #29 ServerLoop () at postmaster.c:1431 #30 0x00609c6d in PostmasterMain (argc=35406528, argv=0x2185160) at postmaster.c:1092 #31 0x005a99a0 in main (argc=5, argv=0x2185140) at main.c:188 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c
Excerpts from Brendan Jurd's message of vie abr 08 06:00:22 -0300 2011: Memtest didn't report any errors. I intend to try swapping out the RAM tomorrow, but in the meantime we got a *different* assertion failure today. The fact that we are tripping over various different assertions seems to lend further weight to the flaky hardware hypothesis. TRAP: FailedAssertion(!(((lpp)-lp_flags == 1)), File: heapam.c, Line: 727) Yep. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c
Brendan Jurd dire...@gmail.com writes: TRAP: FailedAssertion(!((data - start) == data_size), File: heaptuple.c, Line: 255) [ scratches head ... ] That implies that heap_fill_tuple came to a different conclusion about a tuple's data size than the immediately preceding heap_compute_data_size. Which I would sure want to believe is impossible. Have you checked for flaky memory on this machine? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c
On 7 April 2011 16:56, Tom Lane t...@sss.pgh.pa.us wrote: Brendan Jurd dire...@gmail.com writes: TRAP: FailedAssertion(!((data - start) == data_size), File: heaptuple.c, Line: 255) [ scratches head ... ] That implies that heap_fill_tuple came to a different conclusion about a tuple's data size than the immediately preceding heap_compute_data_size. Which I would sure want to believe is impossible. Have you checked for flaky memory on this machine? We are doing so now -- although the RAM is ECC and just a few months old, so flakiness seems a distant possibility. I will report back after we've given it a proper thrashing with memtest. Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c
On 04/07/2011 03:07 PM, Brendan Jurd wrote: On 7 April 2011 16:56, Tom Lanet...@sss.pgh.pa.us wrote: Brendan Jurddire...@gmail.com writes: TRAP: FailedAssertion(!((data - start) == data_size), File: heaptuple.c, Line: 255) [ scratches head ... ] That implies that heap_fill_tuple came to a different conclusion about a tuple's data size than the immediately preceding heap_compute_data_size. Which I would sure want to believe is impossible. Have you checked for flaky memory on this machine? We are doing so now -- although the RAM is ECC and just a few months old, so flakiness seems a distant possibility. I will report back after we've given it a proper thrashing with memtest. Apparently bad RAM can also mean faulty CPU (bad cache, heat problems, etc). memtest86 seems ... rough ... at best when it comes to finding issues; I've had some systems run it for a day yet continuously segfault in real-world use until the RAM was re-seated or swapped out. -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c
Excerpts from Brendan Jurd's message of jue abr 07 03:07:32 -0300 2011: Hi folks, I am running a 9.0.3 Hot Standy + Streaming Replication slave which occasionally segfaults (every 1-2 days). I rebuilt Postgres with --enable-cassert and --enable-debug, switched on core dumping and waited for some results. What's the platform, and what's the query? Are there funny datatypes involved? -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Failed assert ((data - start) == data_size) in heaptuple.c
On 8 April 2011 00:16, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Brendan Jurd's message of jue abr 07 03:07:32 -0300 2011: I am running a 9.0.3 Hot Standy + Streaming Replication slave which occasionally segfaults (every 1-2 days). I rebuilt Postgres with --enable-cassert and --enable-debug, switched on core dumping and waited for some results. What's the platform, and what's the query? Are there funny datatypes involved? Ubuntu 10.04 x64 on: HP DL380R05 1x Quad Core Xeon E5440 10GB PC 5400 DDR ECC 2x HP 146GB 15krpm SAS drives in RAID 1+0 The tomcat instance repeatedly runs a series of some 9 queries, I'm not sure which of the queries is the culprit or even whether it is the same one each time. However, they are all straightforward SELECTs. The one with the most complicated plan joins a whole six tables. I do keep the transaction open until I have executed all the SELECTs in the series, then commit and start over again with a fresh transaction. That's just to make sure all of the queries are pulling data from the same snapshot. As for datatypes, I do have one type that I have defined which is used in one of the queries. It's just an RGB colour value, defined as a composite type: CREATE DOMAIN colour_channel AS smallint CHECK (VALUE = 0 AND VALUE 256); CREATE TYPE rgb AS ( red colour_channel, green colour_channel, bluecolour_channel ); All of the user-defined functions I have written for this db are either SQL or PL/pgSQL, and all of the functions called by these queries are either STABLE or IMMUTABLE. Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers