[Aside: your new mail editor is rewrapping lines in quoted material, and the result is messy. I have rerewrapped one paragraph below.]
On Mon, Mar 14, 2016 at 02:00:03AM +0100, Tomas Vondra wrote: > On Sun, 2016-03-13 at 18:46 -0400, Noah Misch wrote: > > I've not attempted to study the behavior on slow hardware. Instead, my > > report used stat-coalesce-v1.patch[1] to simulate slow writes. (That > > diagnostic patch no longer applies cleanly, so I'm attaching a rebased > > version. I've changed the patch name from "stat-coalesce" to > > "slow-stat-simulate" to more-clearly distinguish it from the > > "pgstat-coalesce" patch.) Problems remain after applying your patch; > > consider "VACUUM pg_am" behavior: > > > > 9.2 w/ stat-coalesce-v1.patch: > > VACUUM returns in 3s, stats collector writes each file 1x over 3s > > HEAD w/ slow-stat-simulate-v2.patch: > > VACUUM returns in 3s, stats collector writes each file 5x over 15s > > HEAD w/ slow-stat-simulate-v2.patch and your patch: > > VACUUM returns in 10s, stats collector writes no files over 10s > > Oh damn, the timestamp comparison in pgstat_recv_inquiry should be in > the opposite direction. After fixing that "VACUUM pg_am" completes in 3 > seconds and writes each file just once. That fixes things. "make check" passes under an 8s stats write time. > --- a/src/backend/postmaster/pgstat.c > +++ b/src/backend/postmaster/pgstat.c > @@ -4836,6 +4836,20 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len) > } > > /* > + * Ignore requests that are already resolved by the last write. > + * > + * We discard the queue of requests at the end of > pgstat_write_statsfiles(), > + * so the requests already waiting on the UDP socket at that moment > can't > + * be discarded in the previous loop. > + * > + * XXX Maybe this should also care about the clock skew, just like the > + * block a few lines down. Yes, it should. (The problem is large (>~100s), backward clock resets, not skew.) A clock reset causing "msg->clock_time < dbentry->stats_timestamp" will usually also cause "msg->cutoff_time < dbentry->stats_timestamp". Such cases need the correction a few lines down. The other thing needed here is to look through and update comments about last_statrequests. For example, this loop is dead code due to us writing files as soon as we receive one inquiry: /* * Find the last write request for this DB. If it's older than the * request's cutoff time, update it; otherwise there's nothing to do. * * Note that if a request is found, we return early and skip the below * check for clock skew. This is okay, since the only way for a DB * request to be present in the list is that we have been here since the * last write round. */ slist_foreach(iter, &last_statrequests) ... I'm okay keeping the dead code for future flexibility, but the comment should reflect that. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers