Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Noah Misch Mon, 14 Mar 2016 19:05:57 -0700

On Mon, Mar 14, 2016 at 01:33:08PM +0100, Tomas Vondra wrote:
> On 03/14/2016 07:14 AM, Noah Misch wrote:
> >On Mon, Mar 14, 2016 at 02:00:03AM +0100, Tomas Vondra wrote:
> >>+    * XXX Maybe this should also care about the clock skew, just like the
> >>+    *     block a few lines down.
> >
> >Yes, it should.  (The problem is large (>~100s), backward clock resets, not
> >skew.)  A clock reset causing "msg->clock_time < dbentry->stats_timestamp"
> >will usually also cause "msg->cutoff_time < dbentry->stats_timestamp".  Such
> >cases need the correction a few lines down.
> 
> I'll look into that. I have to admit I have a hard time reasoning about the
> code handling clock skew, so it might take some time, though.


No hurry; it would be no problem to delay this several months.

> >The other thing needed here is to look through and update comments about
> >last_statrequests.  For example, this loop is dead code due to us writing
> >files as soon as we receive one inquiry:
> >
> >     /*
> >      * Find the last write request for this DB.  If it's older than the
> >      * request's cutoff time, update it; otherwise there's nothing to do.
> >      *
> >      * Note that if a request is found, we return early and skip the below
> >      * check for clock skew.  This is okay, since the only way for a DB
> >      * request to be present in the list is that we have been here since the
> >      * last write round.
> >      */
> >     slist_foreach(iter, &last_statrequests) ...
> >
> >I'm okay keeping the dead code for future flexibility, but the comment should
> >reflect that.
> 
> Yes, that's another thing that I'd like to look into. Essentially the
> problem is that we always process the inquiries one by one, so we never
> actually see a list with more than a single element. Correct?

Correct.

> I think the best way to address that is to peek is to first check how much
> data is in the UDP queue, and then fetching all of that before actually
> doing the writes. Peeking at the number of requests first (or even some
> reasonable hard-coded limit) should should prevent starving the inquirers in
> case of a steady stream or inquiries.

Now that you mention it, a hard-coded limit sounds good: write the files for
pending inquiries whenever the socket empties or every N messages processed,
whichever comes first.  I don't think the amount of pending UDP data is
portably available, and I doubt it's important.  Breaking every, say, one
thousand messages will make the collector predictably responsive to inquiries,
and that is the important property.

I would lean toward making this part 9.7-only; it would be a distinct patch
from the one previously under discussion.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

Reply via email to