On 2014-12-25 14:36:42 -0500, Tom Lane wrote:
> I wonder whether when multiple processes are demanding statsfile updates,
> there's some misbehavior that causes them to suck CPU away from the stats
> collector and/or convince it that it doesn't need to write anything.
> There are odd things in the logs in some of these events.  For example in
> today's failure on hamster,
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2014-12-25%2016%3A00%3A07
> there are two client-visible wait-timeout warnings, one each in the
> gist and spgist tests.  But in the postmaster log we find these in
> fairly close succession:
> [549c38ba.724d:2] WARNING:  pgstat wait timeout
> [549c39b1.73e7:10] WARNING:  pgstat wait timeout
> [549c38ba.724d:3] WARNING:  pgstat wait timeout
> Correlating these with other log entries shows that the first and third
> are from the autovacuum launcher while the second is from the gist test
> session.  So the spgist failure failed to get logged, and in any case the
> big picture is that we had four timeout warnings occurring in a pretty
> short span of time, in a parallel test set that's not all that demanding
> (12 parallel tests, well below our max).  Not sure what to make of that.

My guess is that a checkpoint happened at that time. Maybe it'd be a
good idea to make pg_regress start postgres with log_checkpoints
enabled? My guess is that we'd find horrendous 'sync' times.

Michael: Could you perhaps turn log_checkpoints on in the config?

> BTW, I notice that in the current state of pgstat.c, all the logic for
> keeping track of request arrival times is dead code, because nothing is
> actually looking at DBWriteRequest.request_time.  This makes me think that
> somebody simplified away some logic we maybe should have kept.  But if
> we're going to leave it like this, we could replace the DBWriteRequest
> data structure with a simple OID list and save a fair amount of code.

That's indeed odd. Seems to have been lost when the statsfile was split
into multiple files. Alvaro, Tomas?

I wondered for a second whether the split could be responsible somehow,
but there's reports of that in older backbranches as well:


Andres Freund

 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to