Hi,

On 03/14/2016 07:14 AM, Noah Misch wrote:
[Aside: your new mail editor is rewrapping lines in quoted material, and the
result is messy.  I have rerewrapped one paragraph below.]

Thanks, I've noticed that too. I've been testing Evolution in the past few days, and apparently the line wrapping algorithm is broken. I've switched back to Thunderbird, so hopefully that'll fix it.


On Mon, Mar 14, 2016 at 02:00:03AM +0100, Tomas Vondra wrote:
On Sun, 2016-03-13 at 18:46 -0400, Noah Misch wrote:
I've not attempted to study the behavior on slow hardware.  Instead, my
report used stat-coalesce-v1.patch[1] to simulate slow writes.  (That
diagnostic patch no longer applies cleanly, so I'm attaching a rebased
version.  I've changed the patch name from "stat-coalesce" to
"slow-stat-simulate" to more-clearly distinguish it from the
"pgstat-coalesce" patch.)  Problems remain after applying your patch;
consider "VACUUM pg_am" behavior:

9.2 w/ stat-coalesce-v1.patch:
  VACUUM returns in 3s, stats collector writes each file 1x over 3s
HEAD w/ slow-stat-simulate-v2.patch:
  VACUUM returns in 3s, stats collector writes each file 5x over 15s
HEAD w/ slow-stat-simulate-v2.patch and your patch:
  VACUUM returns in 10s, stats collector writes no files over 10s

Oh damn, the timestamp comparison in pgstat_recv_inquiry should be in
the opposite direction. After fixing that "VACUUM pg_am" completes in 3
seconds and writes each file just once.

That fixes things.  "make check" passes under an 8s stats write time.

OK, good.


--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4836,6 +4836,20 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
        }

        /*
+        * Ignore requests that are already resolved by the last write.
+        *
+        * We discard the queue of requests at the end of 
pgstat_write_statsfiles(),
+        * so the requests already waiting on the UDP socket at that moment 
can't
+        * be discarded in the previous loop.
+        *
+        * XXX Maybe this should also care about the clock skew, just like the
+        *     block a few lines down.

Yes, it should.  (The problem is large (>~100s), backward clock resets, not
skew.)  A clock reset causing "msg->clock_time < dbentry->stats_timestamp"
will usually also cause "msg->cutoff_time < dbentry->stats_timestamp".  Such
cases need the correction a few lines down.

I'll look into that. I have to admit I have a hard time reasoning about the code handling clock skew, so it might take some time, though.


The other thing needed here is to look through and update comments about
last_statrequests.  For example, this loop is dead code due to us writing
files as soon as we receive one inquiry:

        /*
         * Find the last write request for this DB.  If it's older than the
         * request's cutoff time, update it; otherwise there's nothing to do.
         *
         * Note that if a request is found, we return early and skip the below
         * check for clock skew.  This is okay, since the only way for a DB
         * request to be present in the list is that we have been here since the
         * last write round.
         */
        slist_foreach(iter, &last_statrequests) ...

I'm okay keeping the dead code for future flexibility, but the comment should
reflect that.

Yes, that's another thing that I'd like to look into. Essentially the problem is that we always process the inquiries one by one, so we never actually see a list with more than a single element. Correct?

I think the best way to address that is to peek is to first check how much data is in the UDP queue, and then fetching all of that before actually doing the writes. Peeking at the number of requests first (or even some reasonable hard-coded limit) should should prevent starving the inquirers in case of a steady stream or inquiries.

regards
Tomas

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to