Re: [HACKERS] multixacts woes

2015-05-11 Thread Robert Haas
On Mon, May 11, 2015 at 12:56 AM, Noah Misch n...@leadboat.com wrote: On Sun, May 10, 2015 at 09:17:58PM -0400, Robert Haas wrote: On Sun, May 10, 2015 at 1:40 PM, Noah Misch n...@leadboat.com wrote: I don't know whether this deserves prompt remediation, but if it does, I would look no

Re: [HACKERS] multixacts woes

2015-05-11 Thread Noah Misch
On Mon, May 11, 2015 at 08:29:05AM -0400, Robert Haas wrote: Given your concerns, and the need to get a fix for this out the door quickly, what I'm inclined to do for the present is go bump the threshold from 25% of MaxMultiXact to 50% of MaxMultiXact without changing anything else. +1 Your

Re: [HACKERS] multixacts woes

2015-05-11 Thread Joshua D. Drake
On 05/11/2015 10:24 AM, Josh Berkus wrote: In terms of adding a new GUC in 9.5: can't we take a stab at auto-tuning this instead of adding a new GUC? We already have a bunch of freezing GUCs which fewer than 1% of our user base has any idea how to set. That is a documentation problem not a

Re: [HACKERS] multixacts woes

2015-05-11 Thread Josh Berkus
On 05/11/2015 09:54 AM, Robert Haas wrote: OK, I have made this change. Barring further trouble reports, this completes the multixact work I plan to do for the next release. Here is what is outstanding: 1. We might want to introduce a GUC to control the point at which member offset

Re: [HACKERS] multixacts woes

2015-05-11 Thread Alvaro Herrera
Josh Berkus wrote: In terms of adding a new GUC in 9.5: can't we take a stab at auto-tuning this instead of adding a new GUC? We already have a bunch of freezing GUCs which fewer than 1% of our user base has any idea how to set. If you have development resources to pour onto 9.5, I think it

Re: [HACKERS] multixacts woes

2015-05-11 Thread Alvaro Herrera
Robert Haas wrote: OK, I have made this change. Barring further trouble reports, this completes the multixact work I plan to do for the next release. Many thanks for all the effort here -- much appreciated. 2. The recent changes adjust things - for good reason - so that the safe threshold

Re: [HACKERS] multixacts woes

2015-05-11 Thread Robert Haas
On Mon, May 11, 2015 at 10:11 AM, Noah Misch n...@leadboat.com wrote: On Mon, May 11, 2015 at 08:29:05AM -0400, Robert Haas wrote: Given your concerns, and the need to get a fix for this out the door quickly, what I'm inclined to do for the present is go bump the threshold from 25% of

Re: [HACKERS] multixacts woes

2015-05-10 Thread Noah Misch
On Sun, May 10, 2015 at 09:17:58PM -0400, Robert Haas wrote: On Sun, May 10, 2015 at 1:40 PM, Noah Misch n...@leadboat.com wrote: I don't know whether this deserves prompt remediation, but if it does, I would look no further than the hard-coded 25% figure. We permit users to operate

Re: [HACKERS] multixacts woes

2015-05-10 Thread Robert Haas
On Sun, May 10, 2015 at 1:40 PM, Noah Misch n...@leadboat.com wrote: I don't know whether this deserves prompt remediation, but if it does, I would look no further than the hard-coded 25% figure. We permit users to operate close to XID wraparound design limits. GUC maximums force an

Re: [HACKERS] multixacts woes

2015-05-10 Thread Jim Nasby
On 5/8/15 1:15 PM, Robert Haas wrote: I somehow did not realize until very recently that we actually use two SLRUs to keep track of multixacts: one for the multixacts themselves (pg_multixacts/offsets) and one for the members (pg_multixacts/members). Confusingly, members are sometimes called

Re: [HACKERS] multixacts woes

2015-05-10 Thread Noah Misch
On Fri, May 08, 2015 at 02:15:44PM -0400, Robert Haas wrote: My colleague Thomas Munro and I have been working with Alvaro, and also with Kevin and Amit, to fix bug #12990, a multixact-related data corruption bug. Thanks Alvaro, Amit, Kevin, Robert and Thomas for mobilizing to get this fixed.

Re: [HACKERS] multixacts woes

2015-05-10 Thread Andrew Dunstan
On 05/10/2015 10:30 AM, Robert Haas wrote: 2. We have some logic that causes autovacuum to run in spite of autovacuum=off when wraparound threatens. My commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the anti-wraparound protections for multixact members that exist for

Re: [HACKERS] multixacts woes

2015-05-10 Thread José Luis Tallón
On 05/08/2015 09:57 PM, Josh Berkus wrote: [snip] It's certainly possible to have workloads triggering that, but I think it's relatively uncommon. I in most cases I've checked the multixact consumption rate is much lower than the xid consumption. There are some exceptions, but often that's

Re: [HACKERS] multixacts woes

2015-05-10 Thread Robert Haas
On Fri, May 8, 2015 at 5:39 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: 1. I believe that there is still a narrow race condition that cause the multixact code to go crazy and delete all of its data when operating very near the threshold for member space exhaustion. See

Re: [HACKERS] multixacts woes

2015-05-08 Thread Andres Freund
On 2015-05-08 14:32:14 -0400, Robert Haas wrote: On Fri, May 8, 2015 at 2:27 PM, Andres Freund and...@anarazel.de wrote: On 2015-05-08 14:15:44 -0400, Robert Haas wrote: Apparently, we have been hanging our hat since the release of 9.3.0 on the theory that the average multixact won't ever

[HACKERS] multixacts woes

2015-05-08 Thread Robert Haas
My colleague Thomas Munro and I have been working with Alvaro, and also with Kevin and Amit, to fix bug #12990, a multixact-related data corruption bug. I somehow did not realize until very recently that we actually use two SLRUs to keep track of multixacts: one for the multixacts themselves

Re: [HACKERS] multixacts woes

2015-05-08 Thread Robert Haas
On Fri, May 8, 2015 at 2:27 PM, Andres Freund and...@anarazel.de wrote: On 2015-05-08 14:15:44 -0400, Robert Haas wrote: Apparently, we have been hanging our hat since the release of 9.3.0 on the theory that the average multixact won't ever have more than two members, and therefore the members

Re: [HACKERS] multixacts woes

2015-05-08 Thread Andres Freund
Hi, On 2015-05-08 14:15:44 -0400, Robert Haas wrote: Apparently, we have been hanging our hat since the release of 9.3.0 on the theory that the average multixact won't ever have more than two members, and therefore the members SLRU won't overwrite itself and corrupt data. It's essentially a

Re: [HACKERS] multixacts woes

2015-05-08 Thread Alvaro Herrera
Josh Berkus wrote: I have a couple workloads in my pool which do consume mxids faster than xids, due to (I think) execeptional numbers of FK conflicts. It's definitely unusual, though, and I'm sure they'd rather have corruption protection and endure some more vacuums. If we do this, though,

Re: [HACKERS] multixacts woes

2015-05-08 Thread Alvaro Herrera
Robert Haas wrote: My colleague Thomas Munro and I have been working with Alvaro, and also with Kevin and Amit, to fix bug #12990, a multixact-related data corruption bug. Thanks for this great summary of the situation. 1. I believe that there is still a narrow race condition that cause

Re: [HACKERS] multixacts woes

2015-05-08 Thread Josh Berkus
On 05/08/2015 11:27 AM, Andres Freund wrote: Hi, On 2015-05-08 14:15:44 -0400, Robert Haas wrote: 3. It seems to me that there is a danger that some users could see extremely frequent anti-mxid-member-wraparound vacuums as a result of this work. Granted, that beats data corruption or

Re: [HACKERS] multixacts woes

2015-05-08 Thread Andres Freund
On 2015-05-08 12:57:17 -0700, Josh Berkus wrote: I have a couple workloads in my pool which do consume mxids faster than xids, due to (I think) execeptional numbers of FK conflicts. It's definitely unusual, though, and I'm sure they'd rather have corruption protection and endure some more