Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-21 Thread Matt Kelly
Sure, but nobody who is not a developer is going to care about that. A typical user who sees pgstat wait timeout, or doesn't, isn't going to be able to make anything at all out of that. As a user, I wholeheartedly disagree. That warning helped me massively in diagnosing an unhealthy

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-21 Thread Andres Freund
On 2015-01-21 22:43:03 -0500, Matt Kelly wrote: Sure, but nobody who is not a developer is going to care about that. A typical user who sees pgstat wait timeout, or doesn't, isn't going to be able to make anything at all out of that. As a user, I wholeheartedly disagree. Note that

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-20 Thread Tomas Vondra
On 25.12.2014 22:28, Tomas Vondra wrote: On 25.12.2014 21:14, Andres Freund wrote: That's indeed odd. Seems to have been lost when the statsfile was split into multiple files. Alvaro, Tomas? The goal was to keep the logic as close to the original as possible. IIRC there were pgstat wait

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-20 Thread Tomas Vondra
On 21.1.2015 00:38, Michael Paquier wrote: On Wed, Jan 21, 2015 at 1:08 AM, Tomas Vondra I've tried to reproduce this on my Raspberry PI 'machine' and it's not very difficult to trigger this. About 7 out of 10 'make check' runs fail because of 'pgstat wait timeout'. All the occurences I've

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-20 Thread Michael Paquier
On Wed, Jan 21, 2015 at 1:08 AM, Tomas Vondra tomas.von...@2ndquadrant.com wrote: On 25.12.2014 22:28, Tomas Vondra wrote: On 25.12.2014 21:14, Andres Freund wrote: That's indeed odd. Seems to have been lost when the statsfile was split into multiple files. Alvaro, Tomas? The goal was to

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-28 Thread Robert Haas
On Sat, Dec 27, 2014 at 8:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, Dec 27, 2014 at 7:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: This would have the effect of transferring all responsibility for dead-stats-entry cleanup to autovacuum. For

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-27 Thread Heikki Linnakangas
On 12/27/2014 12:16 AM, Alvaro Herrera wrote: Tom Lane wrote: The argument that autovac workers need fresher stats than anything else seems pretty dubious to start with. Why shouldn't we simplify that down to they use PGSTAT_STAT_INTERVAL like everybody else? The point of wanting fresher

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-27 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes: On 12/27/2014 12:16 AM, Alvaro Herrera wrote: Tom Lane wrote: The argument that autovac workers need fresher stats than anything else seems pretty dubious to start with. Why shouldn't we simplify that down to they use PGSTAT_STAT_INTERVAL

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-27 Thread Robert Haas
On Sat, Dec 27, 2014 at 7:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 12/27/2014 12:16 AM, Alvaro Herrera wrote: Tom Lane wrote: The argument that autovac workers need fresher stats than anything else seems pretty dubious to start with. Why

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-27 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Sat, Dec 27, 2014 at 7:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: This would have the effect of transferring all responsibility for dead-stats-entry cleanup to autovacuum. For ordinary users, I think that'd be just fine. It might be less fine

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-26 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes: Tom Lane wrote: Yeah, I've been getting more annoyed by that too lately. I keep wondering though whether there's an actual bug underneath that behavior that we're failing to see. I think the first thing to do is reconsider usage of

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-26 Thread Alvaro Herrera
Tom Lane wrote: The argument that autovac workers need fresher stats than anything else seems pretty dubious to start with. Why shouldn't we simplify that down to they use PGSTAT_STAT_INTERVAL like everybody else? The point of wanting fresher stats than that, eons ago, was to avoid a worker

[HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Andres Freund
Hi, We quite regularly have buildfarm failures that are caused by 'WARNING: pgstat wait timeout' at random points during the build. Especially on some of the ARM buildfarm animals those are really frequent, to the point that it's hard to know the actual state of the buildfarm without looking at

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: So I think a better way to deal with that warning would be a good idea. Besides somehow making the mechanism there are two ways to attack this that I can think of, neither of them awe inspiring: 1) Make that WARNING a LOG message instead. Since

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Andres Freund
On 2014-12-25 14:36:42 -0500, Tom Lane wrote: I wonder whether when multiple processes are demanding statsfile updates, there's some misbehavior that causes them to suck CPU away from the stats collector and/or convince it that it doesn't need to write anything. There are odd things in the

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tomas Vondra
On 25.12.2014 20:36, Tom Lane wrote: Yeah, I've been getting more annoyed by that too lately. I keep wondering though whether there's an actual bug underneath that behavior that we're failing to see. PGSTAT_MAX_WAIT_TIME is already 10 seconds; it's hard to credit that increasing it still

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes: On 25.12.2014 20:36, Tom Lane wrote: BTW, I notice that in the current state of pgstat.c, all the logic for keeping track of request arrival times is dead code, because nothing is actually looking at DBWriteRequest.request_time. Really? Which part of the

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tomas Vondra
On 25.12.2014 21:14, Andres Freund wrote: On 2014-12-25 14:36:42 -0500, Tom Lane wrote: My guess is that a checkpoint happened at that time. Maybe it'd be a good idea to make pg_regress start postgres with log_checkpoints enabled? My guess is that we'd find horrendous 'sync' times.

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes: The strange thing is that the split happened ~2 years ago, which is inconsistent with the sudden increase of this kind of issues. So maybe something changed on that particular animal (a failing SD card causing I/O stalls, perhaps)? I think that hamster has

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tomas Vondra
On 25.12.2014 22:16, Tom Lane wrote: Tomas Vondra t...@fuzzy.cz writes: On 25.12.2014 20:36, Tom Lane wrote: BTW, I notice that in the current state of pgstat.c, all the logic for keeping track of request arrival times is dead code, because nothing is actually looking at

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tomas Vondra
On 25.12.2014 22:40, Tom Lane wrote: Tomas Vondra t...@fuzzy.cz writes: The strange thing is that the split happened ~2 years ago, which is inconsistent with the sudden increase of this kind of issues. So maybe something changed on that particular animal (a failing SD card causing I/O stalls,

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes: On 25.12.2014 22:40, Tom Lane wrote: I think that hamster has basically got a tin can and string for an I/O subsystem. It's not real clear to me whether there's actually been an increase in wait timeout failures recently; somebody would have to go through

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Tomas Vondra
On 26.12.2014 02:59, Tom Lane wrote: Tomas Vondra t...@fuzzy.cz writes: On 25.12.2014 22:40, Tom Lane wrote: I think that hamster has basically got a tin can and string for an I/O subsystem. It's not real clear to me whether there's actually been an increase in wait timeout failures

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Michael Paquier
On Fri, Dec 26, 2014 at 6:28 AM, Tomas Vondra t...@fuzzy.cz wrote: On 25.12.2014 21:14, Andres Freund wrote: On 2014-12-25 14:36:42 -0500, Tom Lane wrote: My guess is that a checkpoint happened at that time. Maybe it'd be a good idea to make pg_regress start postgres with log_checkpoints

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-25 Thread Alvaro Herrera
Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: So I think a better way to deal with that warning would be a good idea. Besides somehow making the mechanism there are two ways to attack this that I can think of, neither of them awe inspiring: 1) Make that WARNING a LOG