Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-16 Thread Alvaro Herrera
Alvaro Herrera wrote: I see another hole in this area. See do_start_worker() -- there we only consider the offsets limit to determine a database to be in almost-wrapped-around state (causing emergency attention). If the database in members trouble has no pgstat entry, it might get

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-15 Thread Alvaro Herrera
Andres Freund wrote: A first version to address this problem can be found appended to this email. Basically it does: * Whenever more than MULTIXACT_MEMBER_SAFE_THRESHOLD are used, signal autovacuum once per members segment * For both members and offsets, once hitting the hard limits,

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-15 Thread Robert Haas
On Fri, Jun 12, 2015 at 7:27 PM, Steve Kehlet steve.keh...@gmail.com wrote: Just wanted to report that I rolled back my VM to where it was with 9.4.2 installed and it wouldn't start. I installed 9.4.4 and now it starts up just fine: 2015-06-12 16:05:58 PDT [6453]: [1-1] LOG: database system

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-12 Thread Steve Kehlet
Just wanted to report that I rolled back my VM to where it was with 9.4.2 installed and it wouldn't start. I installed 9.4.4 and now it starts up just fine: 2015-06-12 16:05:58 PDT [6453]: [1-1] LOG: database system was shut down at 2015-05-27 13:12:55 PDT 2015-06-12 16:05:58 PDT [6453]: [2-1]

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-11 Thread Jeff Janes
On Wed, Jun 10, 2015 at 7:16 PM, Noah Misch n...@leadboat.com wrote: On Mon, Jun 08, 2015 at 03:15:04PM +0200, Andres Freund wrote: One more thing: Our testing infrastructure sucks. Without writing C code it's basically impossible to test wraparounds and such. Even if not particularly

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-10 Thread Noah Misch
On Mon, Jun 08, 2015 at 03:15:04PM +0200, Andres Freund wrote: One more thing: Our testing infrastructure sucks. Without writing C code it's basically impossible to test wraparounds and such. Even if not particularly useful for non-devs, I really think we should have functions for creating

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-08 Thread Alvaro Herrera
Robert Haas wrote: On Mon, Jun 8, 2015 at 1:23 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: (My personal alarm bells go off when I see autovac_naptime=15min or more, but apparently not everybody sees things that way.) Uh, I'd echo that sentiment if you did s/15min/1min/ Yeah,

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-08 Thread Robert Haas
On Mon, Jun 8, 2015 at 1:23 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Andres Freund wrote: On June 8, 2015 7:06:31 PM GMT+02:00, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I might be misreading the code, but PMSIGNAL_START_AUTOVAC_LAUNCHER only causes things to happen (i.e. a

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-08 Thread Alvaro Herrera
Andres Freund wrote: On June 8, 2015 7:06:31 PM GMT+02:00, Alvaro Herrera alvhe...@2ndquadrant.com wrote: I might be misreading the code, but PMSIGNAL_START_AUTOVAC_LAUNCHER only causes things to happen (i.e. a new worker to be started) when autovacuum is disabled. If autovacuum is

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-08 Thread Andres Freund
On 2015-06-08 14:23:32 -0300, Alvaro Herrera wrote: Sure. I just concern that we might be putting excessive trust on emergency workers being launched at a high pace. I'm not sure what to do about that. I mean, it'd not be hard to simply ignore naptime upon wraparound, but I'm not sure that'd

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-08 Thread Alvaro Herrera
Andres Freund wrote: A first version to address this problem can be found appended to this email. Basically it does: * Whenever more than MULTIXACT_MEMBER_SAFE_THRESHOLD are used, signal autovacuum once per members segment * For both members and offsets, once hitting the hard limits,

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-08 Thread Andres Freund
On June 8, 2015 7:06:31 PM GMT+02:00, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Andres Freund wrote: A first version to address this problem can be found appended to this email. Basically it does: * Whenever more than MULTIXACT_MEMBER_SAFE_THRESHOLD are used, signal autovacuum once

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-08 Thread Andres Freund
On 2015-06-05 20:47:33 +0200, Andres Freund wrote: On 2015-06-05 14:33:12 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: 1. The problem that we might truncate an SLRU members page away when it's in the buffers, but not drop it from the buffers, leading to a failure

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-08 Thread Andres Freund
On 2015-06-05 16:56:18 -0400, Tom Lane wrote: Andres Freund and...@anarazel.de writes: On June 5, 2015 10:02:37 PM GMT+02:00, Robert Haas robertmh...@gmail.com wrote: I think we would be foolish to rush that part into the tree. We probably got here in the first place by rushing the last

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-08 Thread Andres Freund
On 2015-06-08 15:15:04 +0200, Andres Freund wrote: 1) the autovacuum trigger logic isn't perfect yet. I.e. especially with autovacuum=off you can get into situations where emergency vacuums aren't started when necessary. This is particularly likely to happen if either very large

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Robert Haas
On Fri, Jun 5, 2015 at 12:00 PM, Andres Freund and...@anarazel.de wrote: On 2015-06-05 11:43:45 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Jun 5, 2015 at 2:20 AM, Noah Misch n...@leadboat.com wrote: I read through this version and found nothing to change. I

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Steve Kehlet
On Fri, Jun 5, 2015 at 11:47 AM Andres Freund and...@anarazel.de wrote: But I'd definitely like some independent testing for it, and I'm not sure if that's doable in time for the wrap. I'd be happy to test on my database that was broken, for however much that helps. It's a VM so I can easily

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Fri, Jun 5, 2015 at 12:00 PM, Andres Freund and...@anarazel.de wrote: On 2015-06-05 11:43:45 -0400, Tom Lane wrote: So where are we on this? Are we ready to schedule a new set of back-branch releases? If not, what issues remain to be looked at?

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Alvaro Herrera
Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: There are at least two other known issues that seem like they should be fixed before we release: 1. The problem that we might truncate an SLRU members page away when it's in the buffers, but not drop it from the buffers, leading

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Andres Freund
On 2015-06-05 14:33:12 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: 1. The problem that we might truncate an SLRU members page away when it's in the buffers, but not drop it from the buffers, leading to a failure when we try to write it later. I've got a fix for this,

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Andres Freund
On 2015-06-05 11:43:45 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Jun 5, 2015 at 2:20 AM, Noah Misch n...@leadboat.com wrote: I read through this version and found nothing to change. I encourage other hackers to study the patch, though. The surrounding code

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Joshua D. Drake
On 06/05/2015 01:56 PM, Tom Lane wrote: If we have confidence that we can ship something on Monday that is materially more trustworthy than the current releases, then let's aim to do that; but let's ship only patches we are confident in. We can do another set of releases later that

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Robert Haas
On Fri, Jun 5, 2015 at 2:36 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: There are at least two other known issues that seem like they should be fixed before we release: 1. The problem that we might truncate an SLRU members

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Tom Lane
Andres Freund and...@anarazel.de writes: On June 5, 2015 10:02:37 PM GMT+02:00, Robert Haas robertmh...@gmail.com wrote: I think we would be foolish to rush that part into the tree. We probably got here in the first place by rushing the last round of fixes too much; let's try not to double

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Alvaro Herrera
Joshua D. Drake wrote: I believe there are likely quite a few parties willing to help test, if we knew how? The code involved is related to checkpoints, pg_basebackups that take a long time to run, and multixact freezing and truncation. If you can set up test servers that eat lots of

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Robert Haas
On Fri, Jun 5, 2015 at 2:47 PM, Andres Freund and...@anarazel.de wrote: On 2015-06-05 14:33:12 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: 1. The problem that we might truncate an SLRU members page away when it's in the buffers, but not drop it from the buffers, leading

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Andres Freund
On June 5, 2015 10:02:37 PM GMT+02:00, Robert Haas robertmh...@gmail.com wrote: On Fri, Jun 5, 2015 at 2:47 PM, Andres Freund and...@anarazel.de wrote: On 2015-06-05 14:33:12 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: 1. The problem that we might truncate an SLRU members

Re: [HACKERS] [GENERAL] 9.4.1 - 9.4.2 problem: could not access status of transaction 1

2015-06-05 Thread Robert Haas
On Fri, Jun 5, 2015 at 4:40 PM, Andres Freund and...@anarazel.de wrote: I think we would be foolish to rush that part into the tree. We probably got here in the first place by rushing the last round of fixes too much; let's try not to double down on that mistake. My problem with that approach