Re: [HACKERS] assertion failure 9.3.4

2014-04-24 Thread Alvaro Herrera
Alvaro Herrera wrote: Alvaro Herrera wrote: I'm thinking about the comparison of full infomask as you propose instead of just the bits that we actually care about. I think the only thing that could cause a spurious failure (causing an extra execution of the HeapTupleSatisfiesUpdate

Re: [HACKERS] assertion failure 9.3.4

2014-04-23 Thread Alvaro Herrera
Alvaro Herrera wrote: I'm thinking about the comparison of full infomask as you propose instead of just the bits that we actually care about. I think the only thing that could cause a spurious failure (causing an extra execution of the HeapTupleSatisfiesUpdate call and the stuff below) is

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andres Freund
On 2014-04-21 19:43:15 -0400, Andrew Dunstan wrote: On 04/21/2014 02:54 PM, Andres Freund wrote: Hi, I spent the last two hours poking arounds in the environment Andrew provided and I was able to reproduce the issue, find a assert to reproduce it much faster and find a possible root

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Alvaro Herrera
Andres Freund wrote: On 2014-04-21 19:43:15 -0400, Andrew Dunstan wrote: On 04/21/2014 02:54 PM, Andres Freund wrote: Hi, I spent the last two hours poking arounds in the environment Andrew provided and I was able to reproduce the issue, find a assert to reproduce it much faster

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Josh Berkus
On 04/22/2014 02:01 PM, Alvaro Herrera wrote: I think I should push this patch first, so that Andrew and Josh can try their respective test cases which should start throwing errors, then push the actual fixes. Does that sound okay? Note that I have a limited ability to actually test my

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andrew Dunstan
On 04/22/2014 05:20 PM, Josh Berkus wrote: On 04/22/2014 02:01 PM, Alvaro Herrera wrote: I think I should push this patch first, so that Andrew and Josh can try their respective test cases which should start throwing errors, then push the actual fixes. Does that sound okay? Note that I have

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Josh Berkus
On 04/22/2014 02:01 PM, Alvaro Herrera wrote: Some testing later, I think the issue only occurs if we determine that we don't need to wait for the xid/multi to complete, because otherwise the wait itself saves us. (It's easy to cause the problem by adding a breakpoint in heapam.c:3325, i.e.

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andres Freund
On 2014-04-22 17:36:42 -0400, Andrew Dunstan wrote: On 04/22/2014 05:20 PM, Josh Berkus wrote: On 04/22/2014 02:01 PM, Alvaro Herrera wrote: I think I should push this patch first, so that Andrew and Josh can try their respective test cases which should start throwing errors, then push the

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andres Freund
On 2014-04-22 14:40:46 -0700, Josh Berkus wrote: On 04/22/2014 02:01 PM, Alvaro Herrera wrote: Some testing later, I think the issue only occurs if we determine that we don't need to wait for the xid/multi to complete, because otherwise the wait itself saves us. (It's easy to cause the

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andres Freund
On 2014-04-22 18:01:40 -0300, Alvaro Herrera wrote: Thanks for the analysis and patches. I've been playing with this on my own a bit, and one thing that I just noticed is that at least for heap_update I cannot reproduce a problem when the xmax is originally a multixact, so AFAICT the number

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Josh Berkus
In order to encounter this issue, I'd need to have two concurrent processes update the child records of the same parent record? That is: A --- B1 \--- B2 ... and the issue should only happen if I update both B1 and B2 concurrently in separate sessions? I don't think that'll trigger

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Andres Freund
On 2014-04-22 14:49:00 -0700, Josh Berkus wrote: In order to encounter this issue, I'd need to have two concurrent processes update the child records of the same parent record? That is: A --- B1 \--- B2 ... and the issue should only happen if I update both B1 and B2

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Alvaro Herrera
Josh Berkus wrote: In order to encounter this issue, I'd need to have two concurrent processes update the child records of the same parent record? That is: A --- B1 \--- B2 ... and the issue should only happen if I update both B1 and B2 concurrently in separate sessions?

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Alvaro Herrera
Andres Freund wrote: On 2014-04-22 18:01:40 -0300, Alvaro Herrera wrote: Thanks for the analysis and patches. I've been playing with this on my own a bit, and one thing that I just noticed is that at least for heap_update I cannot reproduce a problem when the xmax is originally a

Re: [HACKERS] assertion failure 9.3.4

2014-04-22 Thread Josh Berkus
On 04/22/2014 05:07 PM, Alvaro Herrera wrote: If you want to make it easier to reproduce, you need to insert some pg_usleep() calls in carefully selected spots. As Andres says, the window is small normally. Yeah, but the whole point of this is that having pg_stat-statements/auto_explain

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Josh Berkus
All, More on this: 1) I've confirmed at the 2nd site that the issue doesn't happen if pg_stat_statements.so is not loaded. So this seems to be confirmation that either auto_explain, pg_stat_statements, or both need to be loaded (but not necessarily created as extensions) in order to have the

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Stephen Frost
* Josh Berkus (j...@agliodbs.com) wrote: 1) I've confirmed at the 2nd site that the issue doesn't happen if pg_stat_statements.so is not loaded. So this seems to be confirmation that either auto_explain, pg_stat_statements, or both need to be loaded (but not necessarily created as extensions)

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Josh Berkus
Can you get the infomask bits..? What's does pg_controldata report wrt the MultiXid's? Can't get the infomask bits. pg_controldata attached, with some redactions. Unfortunately, it appears that they've continued to do tests on this system, so the XID counter has advanced somewhat.

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: 1) I've confirmed at the 2nd site that the issue doesn't happen if pg_stat_statements.so is not loaded. So this seems to be confirmation that either auto_explain, pg_stat_statements, or both need to be loaded (but not necessarily created as extensions) in

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andres Freund
Hi, I spent the last two hours poking arounds in the environment Andrew provided and I was able to reproduce the issue, find a assert to reproduce it much faster and find a possible root cause. Since the symptom of the problem seem to be multixacts with more than one updating xid, I added a

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: I spent the last two hours poking arounds in the environment Andrew provided and I was able to reproduce the issue, find a assert to reproduce it much faster and find a possible root cause. Hmm ... is this the same thing Josh is reporting? If so,

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Josh Berkus
On 04/21/2014 12:26 PM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: I spent the last two hours poking arounds in the environment Andrew provided and I was able to reproduce the issue, find a assert to reproduce it much faster and find a possible root cause. Hmm ... is this

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andrew Dunstan
On 04/21/2014 03:26 PM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: I spent the last two hours poking arounds in the environment Andrew provided and I was able to reproduce the issue, find a assert to reproduce it much faster and find a possible root cause. Hmm ... is this the

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andres Freund
On 2014-04-21 15:26:03 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: I spent the last two hours poking arounds in the environment Andrew provided and I was able to reproduce the issue, find a assert to reproduce it much faster and find a possible root cause. Hmm ...

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andres Freund
On 2014-04-21 12:31:09 -0700, Josh Berkus wrote: On 04/21/2014 12:26 PM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: I spent the last two hours poking arounds in the environment Andrew provided and I was able to reproduce the issue, find a assert to reproduce it much

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Josh Berkus
Josh, how long does it take you to reproduce the issue? A couple hours. And can you reproduce it outside of a production environment? Not yet, still working on that. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Andrew Dunstan
On 04/21/2014 02:54 PM, Andres Freund wrote: Hi, I spent the last two hours poking arounds in the environment Andrew provided and I was able to reproduce the issue, find a assert to reproduce it much faster and find a possible root cause. What's the assert that makes it happen faster? That

Re: [HACKERS] assertion failure 9.3.4

2014-04-21 Thread Josh Berkus
All, I've taken a stab at creating a reproduceable test case based on the characterisitics of the production issues I'm seeing. But clearly there's an element I'm missing, because I'm not able to produce the bug with a pgbench-based test case. My current test has FKs, updating both FK'd tables,

Re: [HACKERS] assertion failure 9.3.4

2014-04-18 Thread Andrew Dunstan
On 04/17/2014 10:15 AM, Andrew Dunstan wrote: On 04/16/2014 10:28 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 04/16/2014 07:19 PM, Tom Lane wrote: Yeah, it would be real nice to see a self-contained test case for this. Well, that might be hard to put together, but I

Re: [HACKERS] assertion failure 9.3.4

2014-04-18 Thread Josh Berkus
On 04/18/2014 09:42 AM, Andrew Dunstan wrote: There definitely seems to be something going on involving these two pre-loaded modules. With both auto_explain and pg_stat_statements preloaded I can reproduce the error fairly reliably. I have also reproduced it, but less reliably, with

Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Andrew Dunstan
On 04/16/2014 10:28 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 04/16/2014 07:19 PM, Tom Lane wrote: Yeah, it would be real nice to see a self-contained test case for this. Well, that might be hard to put together, but I did try running without pg_stat_statements and

Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Josh Berkus
All, So have encountered a 2nd report of this issue, or of an issue which sounds very similar: - corruption in two queue tables - the tables are written in a high-concurrency, lock-contested environment - user uses SELECT FOR UPDATE with these tables. - pg_stat_statements .so is loaded, but

Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Peter Geoghegan
On Thu, Apr 17, 2014 at 7:15 AM, Andrew Dunstan and...@dunslane.net wrote: track_activity_query_size = 10240 shared_preload_libraries = 'auto_explain,pg_stat_statements' As you can see, auto_explain's log_min_duration hasn't been set, so it shouldn't be doing anything very much, I should

Re: [HACKERS] assertion failure 9.3.4

2014-04-17 Thread Andrew Dunstan
On 04/17/2014 09:04 PM, Peter Geoghegan wrote: On Thu, Apr 17, 2014 at 7:15 AM, Andrew Dunstan and...@dunslane.net wrote: track_activity_query_size = 10240 shared_preload_libraries = 'auto_explain,pg_stat_statements' As you can see, auto_explain's log_min_duration hasn't been set, so it

Re: [HACKERS] assertion failure 9.3.4

2014-04-16 Thread Alvaro Herrera
So, from top to bottom I see the following elements: * backend is executing a query * this query is getting captured by pg_stat_statements * the query is also getting captured by autoexplain, in chain from pg_stat_statements * autoexplain runs the query, which invokes a plpgsql function * this

Re: [HACKERS] assertion failure 9.3.4

2014-04-16 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes: I'm not quite clear on why the third query, the one in ri_PerformCheck, is invoking a sequence. It's not --- SeqNext is the next-tuple function for a sequential scan. Nothing to do with sequences. Now, it *is* worth wondering why the heck a query

Re: [HACKERS] assertion failure 9.3.4

2014-04-16 Thread Andrew Dunstan
On 04/16/2014 07:19 PM, Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: I'm not quite clear on why the third query, the one in ri_PerformCheck, is invoking a sequence. It's not --- SeqNext is the next-tuple function for a sequential scan. Nothing to do with sequences. Now, it

Re: [HACKERS] assertion failure 9.3.4

2014-04-16 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes: On 04/16/2014 07:19 PM, Tom Lane wrote: Yeah, it would be real nice to see a self-contained test case for this. Well, that might be hard to put together, but I did try running without pg_stat_statements and auto_explain loaded and the error did not

Re: [HACKERS] assertion failure 9.3.4

2014-04-16 Thread Alvaro Herrera
Andrew Dunstan wrote: On 04/16/2014 07:19 PM, Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: I'm not quite clear on why the third query, the one in ri_PerformCheck, is invoking a sequence. It's not --- SeqNext is the next-tuple function for a sequential scan. Nothing to do

Re: [HACKERS] assertion failure 9.3.4

2014-04-14 Thread Andrew Dunstan
On 04/14/2014 09:28 PM, Andrew Dunstan wrote: With a client's code I have just managed to produce the following assertion failure on 9.3.4: 2014-04-15 01:02:46 GMT [19854] 76299: LOG: execute unnamed: select * from asp_ins_event_task_log( job_id:=1, event_id:=3164,

Re: [HACKERS] assertion failure 9.3.4

2014-04-14 Thread Alvaro Herrera
Andrew Dunstan wrote: and here the stack trace: #0 0x00361ba36285 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00361ba37b9b in __GI_abort () at abort.c:91 #2 0x0075c157 in ExceptionalCondition (conditionName=optimized out,

Re: [HACKERS] assertion failure 9.3.4

2014-04-14 Thread Andrew Dunstan
On 04/14/2014 10:02 PM, Alvaro Herrera wrote: Andrew Dunstan wrote: and here the stack trace: #0 0x00361ba36285 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00361ba37b9b in __GI_abort () at abort.c:91 #2 0x0075c157 in