Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tom Lane
Tomas Vondra  writes:
> I've checked how this worked in 9.2 (before the 9.3 patch that split the 
> file per db), and back then last_statsrequest (transformed to 
> request_time) was used to decide whether we need to write something. But 
> now we do that by simply checking whether the list is empty.

Right.  In effect, 9.3 moved the decision about "do we need to write stats
because of this request" from the main loop to pgstat_recv_inquiry()
... but we forgot to incorporate any check for whether the request was
already satisfied into pgstat_recv_inquiry().  We can do that, though,
as per either of the patches under discussion --- and once we do so,
maintaining DBWriteRequest.request_time seems a bit pointless.

It's conceivable that we'd try to implement merging of close-together
requests in a way that would take account of how far back the oldest
unsatisfied request is.  But I think that a global oldest-request time
would be sufficient for that; we don't need to track it per-database.
In any case, it's hard to see exactly how to make that work without
putting a gettimeofday() call into the inner message handling loop,
which I believe we won't want to do on performance grounds.  The previous
speculation about doing writes every N messages or when we have no input
to process seems more likely to lead to a useful answer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tomas Vondra

On 05/31/2016 07:24 PM, Tom Lane wrote:

Tomas Vondra  writes:

On 05/31/2016 06:59 PM, Tom Lane wrote:

I'm confused here --- are you speaking of having removed
if (msg->cutoff_time > req->request_time)
req->request_time = msg->cutoff_time;
? That is not a check for clock skew, it's intending to be sure that
req->request_time reflects the latest request for this DB when we've
seen more than one request. But since req->request_time isn't
actually being used anywhere, it's useless code.



Ah, you're right. I've made the mistake of writing the e-mail before
drinking any coffee today, and I got distracted by the comment change.



I reformatted the actual check for clock skew, but I do not think I
changed its behavior.



I'm not sure it does not change the behavior, though. request_time only
became unused after you removed the two places that set the value (one
of them in the clock skew check).


Well, it's unused in the sense that the if-test quoted above is the only
place in HEAD that examines the value of request_time.  And since that
if-test only controls whether we change the value, and not whether we
proceed to make the clock skew check, I don't see how it's related
to clock skew or indeed anything else at all.


I see, in that case it indeed is useless.

I've checked how this worked in 9.2 (before the 9.3 patch that split the 
file per db), and back then last_statsrequest (transformed to 
request_time) was used to decide whether we need to write something. But 
now we do that by simply checking whether the list is empty.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tom Lane
Tomas Vondra  writes:
> On 05/31/2016 06:59 PM, Tom Lane wrote:
>> I'm confused here --- are you speaking of having removed
>>  if (msg->cutoff_time > req->request_time)
>>  req->request_time = msg->cutoff_time;
>> ? That is not a check for clock skew, it's intending to be sure that
>> req->request_time reflects the latest request for this DB when we've
>> seen more than one request. But since req->request_time isn't
>> actually being used anywhere, it's useless code.

> Ah, you're right. I've made the mistake of writing the e-mail before 
> drinking any coffee today, and I got distracted by the comment change.

>> I reformatted the actual check for clock skew, but I do not think I
>> changed its behavior.

> I'm not sure it does not change the behavior, though. request_time only 
> became unused after you removed the two places that set the value (one 
> of them in the clock skew check).

Well, it's unused in the sense that the if-test quoted above is the only
place in HEAD that examines the value of request_time.  And since that
if-test only controls whether we change the value, and not whether we
proceed to make the clock skew check, I don't see how it's related
to clock skew or indeed anything else at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tomas Vondra

On 05/31/2016 06:59 PM, Tom Lane wrote:

Tomas Vondra  writes:

On 05/26/2016 10:10 PM, Tom Lane wrote:

I posted a patch at
https://www.postgresql.org/message-id/13023.1464213...@sss.pgh.pa.us
which I think is functionally equivalent to what you have here, but
it goes to some lengths to make the code more readable, whereas this
is just adding another layer of complication to something that's
already a mess (eg, the request_time field is quite useless as-is).
So I'd like to propose pushing that in place of this patch ... do you
care to review it first?



I've reviewed the patch today, and it seems fine to me - correct and
achieving the same goal as the patch posted to this thread (plus fixing
the issue with shared catalogs and fixing many comments).


Thanks for reviewing!


FWIW do you still plan to back-patch this? Minimizing the amount of
changes was one of the things I had in mind when writing "my" patch,
which is why I ended up with parts that are less readable.


Yeah, I think it's a bug fix and should be back-patched.  I'm not in
favor of making things more complicated just to reduce the number of
lines a patch touches.


The one change I'm not quite sure about is the removal of clock skew
detection in pgstat_recv_inquiry(). You've removed the first check on
the inquiry, replacing it with this comment:
 It seems sufficient to check for clock skew once per write round.
But the first check was comparing msg/req, while the second check looks
at dbentry/cur_ts. I don't see how those two clock skew check are
redundant - if they are, the comment should explain that I guess.


I'm confused here --- are you speaking of having removed

if (msg->cutoff_time > req->request_time)
req->request_time = msg->cutoff_time;

? That is not a check for clock skew, it's intending to be sure that
req->request_time reflects the latest request for this DB when we've
seen more than one request. But since req->request_time isn't
actually being used anywhere, it's useless code.


Ah, you're right. I've made the mistake of writing the e-mail before 
drinking any coffee today, and I got distracted by the comment change.



I reformatted the actual check for clock skew, but I do not think I
changed its behavior.


I'm not sure it does not change the behavior, though. request_time only 
became unused after you removed the two places that set the value (one 
of them in the clock skew check).


I'm not sure this is a bad change, though. But there was a dependency 
between the new request and the preceding one.





Another thing is that if you believe merging requests across databases
is a silly idea, maybe we should bite the bullet and replace the list of
requests with a single item. I'm not convinced about this, though.


No, I don't want to do that either.  We're not spending much code by
having pending_write_requests be a list rather than a single entry,
and we might eventually figure out a reasonable way to time the flushes
so that we can merge requests.



+1

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tom Lane
Tomas Vondra  writes:
> On 05/26/2016 10:10 PM, Tom Lane wrote:
>> I posted a patch at
>> https://www.postgresql.org/message-id/13023.1464213...@sss.pgh.pa.us
>> which I think is functionally equivalent to what you have here, but
>> it goes to some lengths to make the code more readable, whereas this
>> is just adding another layer of complication to something that's
>> already a mess (eg, the request_time field is quite useless as-is).
>> So I'd like to propose pushing that in place of this patch ... do you
>> care to review it first?

> I've reviewed the patch today, and it seems fine to me - correct and 
> achieving the same goal as the patch posted to this thread (plus fixing 
> the issue with shared catalogs and fixing many comments).

Thanks for reviewing!

> FWIW do you still plan to back-patch this? Minimizing the amount of 
> changes was one of the things I had in mind when writing "my" patch, 
> which is why I ended up with parts that are less readable.

Yeah, I think it's a bug fix and should be back-patched.  I'm not in
favor of making things more complicated just to reduce the number of
lines a patch touches.

> The one change I'm not quite sure about is the removal of clock skew 
> detection in pgstat_recv_inquiry(). You've removed the first check on 
> the inquiry, replacing it with this comment:
>  It seems sufficient to check for clock skew once per write round.
> But the first check was comparing msg/req, while the second check looks 
> at dbentry/cur_ts. I don't see how those two clock skew check are 
> redundant - if they are, the comment should explain that I guess.

I'm confused here --- are you speaking of having removed

if (msg->cutoff_time > req->request_time)
req->request_time = msg->cutoff_time;

?  That is not a check for clock skew, it's intending to be sure that
req->request_time reflects the latest request for this DB when we've seen
more than one request.  But since req->request_time isn't actually being
used anywhere, it's useless code.

I reformatted the actual check for clock skew, but I do not think I
changed its behavior.

> Another thing is that if you believe merging requests across databases 
> is a silly idea, maybe we should bite the bullet and replace the list of 
> requests with a single item. I'm not convinced about this, though.

No, I don't want to do that either.  We're not spending much code by
having pending_write_requests be a list rather than a single entry,
and we might eventually figure out a reasonable way to time the flushes
so that we can merge requests.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-31 Thread Tomas Vondra

Hi,

On 05/26/2016 10:10 PM, Tom Lane wrote:

Tomas Vondra  writes:

Attached is a patch that should fix the coalescing, including the clock
skew detection. In the end I reorganized the code a bit, moving the
check at the end, after the clock skew detection. Otherwise I'd have to
do the clock skew detection on multiple places, and that seemed ugly.


I hadn't been paying any attention to this thread, I must confess.
But I rediscovered this no-coalescing problem while pursuing the poor
behavior for shared catalogs that Peter complained of in
https://www.postgresql.org/message-id/56ad41ac.1030...@gmx.net

I posted a patch at
https://www.postgresql.org/message-id/13023.1464213...@sss.pgh.pa.us
which I think is functionally equivalent to what you have here, but
it goes to some lengths to make the code more readable, whereas this
is just adding another layer of complication to something that's
already a mess (eg, the request_time field is quite useless as-is).
So I'd like to propose pushing that in place of this patch ... do you
care to review it first?


I've reviewed the patch today, and it seems fine to me - correct and 
achieving the same goal as the patch posted to this thread (plus fixing 
the issue with shared catalogs and fixing many comments).


FWIW do you still plan to back-patch this? Minimizing the amount of 
changes was one of the things I had in mind when writing "my" patch, 
which is why I ended up with parts that are less readable.


The one change I'm not quite sure about is the removal of clock skew 
detection in pgstat_recv_inquiry(). You've removed the first check on 
the inquiry, replacing it with this comment:


It seems sufficient to check for clock skew once per write round.

But the first check was comparing msg/req, while the second check looks 
at dbentry/cur_ts. I don't see how those two clock skew check are 
redundant - if they are, the comment should explain that I guess.


Another thing is that if you believe merging requests across databases 
is a silly idea, maybe we should bite the bullet and replace the list of 
requests with a single item. I'm not convinced about this, though.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-26 Thread Tom Lane
Tomas Vondra  writes:
> On 05/26/2016 10:10 PM, Tom Lane wrote:
>> In view of 52e8fc3e2, there's more or less no case in which we'd be
>> writing stats without writing stats for the shared catalogs. So I'm
>> tempted to propose that we try to reduce the overhead by merging the
>> shared-catalog stats back into the global-stats file, thereby
>> halving the filesystem metadata traffic for updating those.

> I find this a bit contradictory with the previous paragraph. If you 
> believe that reducing the filesystem metadata traffic will have a 
> measurable benefit, then surely merging writes for multiple dbs (thus 
> not writing the global/shared files multiple times) will have even 
> higher impact, no?

Well, my thinking is that this is something we could get "for free"
without any penalty in response time.  Going further will require
some sort of tradeoff.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-26 Thread Michael Paquier
On Thu, May 26, 2016 at 6:43 PM, Tomas Vondra
 wrote:
> On 05/26/2016 10:10 PM, Tom Lane wrote:
>> Tomas Vondra  writes:
>> In view of 52e8fc3e2, there's more or less no case in which we'd be
>> writing stats without writing stats for the shared catalogs. So I'm
>> tempted to propose that we try to reduce the overhead by merging the
>> shared-catalog stats back into the global-stats file, thereby
>> halving the filesystem metadata traffic for updating those.
>
> [...]
>
> That being said, I'm not opposed to merging the shared catalog into the
> global-stats file - it's not really a separate database so having it in a
> separate file is a bit weird.

While looking at this stuff, to be honest I got surprised that shared
relation stats are in located in a file whose name depends on
InvalidOid, so +1 from here as well to merge that into the global
stats file.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-26 Thread Tomas Vondra

Hi,

On 05/26/2016 10:10 PM, Tom Lane wrote:

Tomas Vondra  writes:

Attached is a patch that should fix the coalescing, including the clock
skew detection. In the end I reorganized the code a bit, moving the
check at the end, after the clock skew detection. Otherwise I'd have to
do the clock skew detection on multiple places, and that seemed ugly.


I hadn't been paying any attention to this thread, I must confess.
But I rediscovered this no-coalescing problem while pursuing the poor
behavior for shared catalogs that Peter complained of in
https://www.postgresql.org/message-id/56ad41ac.1030...@gmx.net

I posted a patch at
https://www.postgresql.org/message-id/13023.1464213...@sss.pgh.pa.us
which I think is functionally equivalent to what you have here, but
it goes to some lengths to make the code more readable, whereas this
is just adding another layer of complication to something that's
already a mess (eg, the request_time field is quite useless as-is).
So I'd like to propose pushing that in place of this patch ... do
you care to review it first?


I do care and I'll look at it over the next few days. FWIW when writing 
that patch I intentionally refrained from major changes, as I think the 
plan was to backpatch it. But +1 for more readable code from me.




Reacting to the thread overall:

I see Noah's concern about wanting to merge the write work for
requests about different databases. I've got mixed feelings about
that: it's arguable that any such change would make things worse not
better. In particular, it's inevitable that trying to merge writes
will result in delaying the response to the first request, whether
or not we are able to merge anything. That's not good in itself, and
it means that we can't hope to merge requests over any very long
interval, which very possibly will prevent any merging from
happening in real situations. Also, considering that we know the
stats collector can be pretty slow to respond at all under load, I'm
 worried that this would result in more outright failures.

Moreover, what we'd hope to gain from merging is fewer writes of the
global stats file and the shared-catalog stats file; but neither of
those are very big, so I'm skeptical of what we'd win.


Yep. Clearly there's a trade-off between slowing down response to the 
first request vs. speeding-up the whole process, but as you point out we 
probably can't gain enough to justify that.


I wonder if that's still true on clusters with many databases (say, 
shared systems with thousands of dbs). Perhaps walking the list just 
once would save enough CPU to make this a win.


In any case, if we decide to abandon the idea of merging requests for 
multiple databases, that probably means we can further simplify the 
code. last_statrequests is a list but it actually never contains more 
than just a single request. We kept it that way because of the plan to 
add the merging. But if that's not worth it ...




In view of 52e8fc3e2, there's more or less no case in which we'd be
writing stats without writing stats for the shared catalogs. So I'm
tempted to propose that we try to reduce the overhead by merging the
shared-catalog stats back into the global-stats file, thereby
halving the filesystem metadata traffic for updating those.


I find this a bit contradictory with the previous paragraph. If you 
believe that reducing the filesystem metadata traffic will have a 
measurable benefit, then surely merging writes for multiple dbs (thus 
not writing the global/shared files multiple times) will have even 
higher impact, no?


E.g. let's assume we're still writing the global+shared+db files for 
each database. If there are requests for 10 databases, we'll write 30 
files. If we merge those requests first, we're writing only 12 files.


So I'm not sure about the metadata traffic argument, we'd need to see 
some numbers showing it really makes a difference.


That being said, I'm not opposed to merging the shared catalog into the 
global-stats file - it's not really a separate database so having it in 
a separate file is a bit weird.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-05-26 Thread Tom Lane
Tomas Vondra  writes:
> Attached is a patch that should fix the coalescing, including the clock 
> skew detection. In the end I reorganized the code a bit, moving the 
> check at the end, after the clock skew detection. Otherwise I'd have to 
> do the clock skew detection on multiple places, and that seemed ugly.

I hadn't been paying any attention to this thread, I must confess.
But I rediscovered this no-coalescing problem while pursuing the poor
behavior for shared catalogs that Peter complained of in
https://www.postgresql.org/message-id/56ad41ac.1030...@gmx.net

I posted a patch at
https://www.postgresql.org/message-id/13023.1464213...@sss.pgh.pa.us
which I think is functionally equivalent to what you have here, but
it goes to some lengths to make the code more readable, whereas this
is just adding another layer of complication to something that's
already a mess (eg, the request_time field is quite useless as-is).
So I'd like to propose pushing that in place of this patch ... do you
care to review it first?

Reacting to the thread overall:

I see Noah's concern about wanting to merge the write work for requests
about different databases.  I've got mixed feelings about that: it's
arguable that any such change would make things worse not better.
In particular, it's inevitable that trying to merge writes will result
in delaying the response to the first request, whether or not we are
able to merge anything.  That's not good in itself, and it means that
we can't hope to merge requests over any very long interval, which very
possibly will prevent any merging from happening in real situations.
Also, considering that we know the stats collector can be pretty slow
to respond at all under load, I'm worried that this would result in
more outright failures.

Moreover, what we'd hope to gain from merging is fewer writes of the
global stats file and the shared-catalog stats file; but neither of
those are very big, so I'm skeptical of what we'd win.

In view of 52e8fc3e2, there's more or less no case in which we'd be
writing stats without writing stats for the shared catalogs.  So I'm
tempted to propose that we try to reduce the overhead by merging the
shared-catalog stats back into the global-stats file, thereby halving
the filesystem metadata traffic for updating those.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-03-18 Thread Tomas Vondra

Hi,

On 03/15/2016 03:04 AM, Noah Misch wrote:

On Mon, Mar 14, 2016 at 01:33:08PM +0100, Tomas Vondra wrote:

On 03/14/2016 07:14 AM, Noah Misch wrote:

On Mon, Mar 14, 2016 at 02:00:03AM +0100, Tomas Vondra wrote:

+* XXX Maybe this should also care about the clock skew, just like the
+* block a few lines down.


Yes, it should.  (The problem is large (>~100s), backward clock resets, not
skew.)  A clock reset causing "msg->clock_time < dbentry->stats_timestamp"
will usually also cause "msg->cutoff_time < dbentry->stats_timestamp".  Such
cases need the correction a few lines down.


I'll look into that. I have to admit I have a hard time reasoning about the
code handling clock skew, so it might take some time, though.


No hurry; it would be no problem to delay this several months.


Attached is a patch that should fix the coalescing, including the clock 
skew detection. In the end I reorganized the code a bit, moving the 
check at the end, after the clock skew detection. Otherwise I'd have to 
do the clock skew detection on multiple places, and that seemed ugly.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgstat-coalesce-v3.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-03-14 Thread Noah Misch
On Mon, Mar 14, 2016 at 01:33:08PM +0100, Tomas Vondra wrote:
> On 03/14/2016 07:14 AM, Noah Misch wrote:
> >On Mon, Mar 14, 2016 at 02:00:03AM +0100, Tomas Vondra wrote:
> >>+* XXX Maybe this should also care about the clock skew, just like the
> >>+* block a few lines down.
> >
> >Yes, it should.  (The problem is large (>~100s), backward clock resets, not
> >skew.)  A clock reset causing "msg->clock_time < dbentry->stats_timestamp"
> >will usually also cause "msg->cutoff_time < dbentry->stats_timestamp".  Such
> >cases need the correction a few lines down.
> 
> I'll look into that. I have to admit I have a hard time reasoning about the
> code handling clock skew, so it might take some time, though.

No hurry; it would be no problem to delay this several months.

> >The other thing needed here is to look through and update comments about
> >last_statrequests.  For example, this loop is dead code due to us writing
> >files as soon as we receive one inquiry:
> >
> > /*
> >  * Find the last write request for this DB.  If it's older than the
> >  * request's cutoff time, update it; otherwise there's nothing to do.
> >  *
> >  * Note that if a request is found, we return early and skip the below
> >  * check for clock skew.  This is okay, since the only way for a DB
> >  * request to be present in the list is that we have been here since the
> >  * last write round.
> >  */
> > slist_foreach(iter, _statrequests) ...
> >
> >I'm okay keeping the dead code for future flexibility, but the comment should
> >reflect that.
> 
> Yes, that's another thing that I'd like to look into. Essentially the
> problem is that we always process the inquiries one by one, so we never
> actually see a list with more than a single element. Correct?

Correct.

> I think the best way to address that is to peek is to first check how much
> data is in the UDP queue, and then fetching all of that before actually
> doing the writes. Peeking at the number of requests first (or even some
> reasonable hard-coded limit) should should prevent starving the inquirers in
> case of a steady stream or inquiries.

Now that you mention it, a hard-coded limit sounds good: write the files for
pending inquiries whenever the socket empties or every N messages processed,
whichever comes first.  I don't think the amount of pending UDP data is
portably available, and I doubt it's important.  Breaking every, say, one
thousand messages will make the collector predictably responsive to inquiries,
and that is the important property.

I would lean toward making this part 9.7-only; it would be a distinct patch
from the one previously under discussion.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-03-14 Thread Tomas Vondra

Hi,

On 03/14/2016 07:14 AM, Noah Misch wrote:

[Aside: your new mail editor is rewrapping lines in quoted material, and the
result is messy.  I have rerewrapped one paragraph below.]


Thanks, I've noticed that too. I've been testing Evolution in the past 
few days, and apparently the line wrapping algorithm is broken. I've 
switched back to Thunderbird, so hopefully that'll fix it.




On Mon, Mar 14, 2016 at 02:00:03AM +0100, Tomas Vondra wrote:

On Sun, 2016-03-13 at 18:46 -0400, Noah Misch wrote:

I've not attempted to study the behavior on slow hardware.  Instead, my
report used stat-coalesce-v1.patch[1] to simulate slow writes.  (That
diagnostic patch no longer applies cleanly, so I'm attaching a rebased
version.  I've changed the patch name from "stat-coalesce" to
"slow-stat-simulate" to more-clearly distinguish it from the
"pgstat-coalesce" patch.)  Problems remain after applying your patch;
consider "VACUUM pg_am" behavior:

9.2 w/ stat-coalesce-v1.patch:
  VACUUM returns in 3s, stats collector writes each file 1x over 3s
HEAD w/ slow-stat-simulate-v2.patch:
  VACUUM returns in 3s, stats collector writes each file 5x over 15s
HEAD w/ slow-stat-simulate-v2.patch and your patch:
  VACUUM returns in 10s, stats collector writes no files over 10s


Oh damn, the timestamp comparison in pgstat_recv_inquiry should be in
the opposite direction. After fixing that "VACUUM pg_am" completes in 3
seconds and writes each file just once.


That fixes things.  "make check" passes under an 8s stats write time.


OK, good.




--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4836,6 +4836,20 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
}

/*
+* Ignore requests that are already resolved by the last write.
+*
+* We discard the queue of requests at the end of 
pgstat_write_statsfiles(),
+* so the requests already waiting on the UDP socket at that moment 
can't
+* be discarded in the previous loop.
+*
+* XXX Maybe this should also care about the clock skew, just like the
+* block a few lines down.


Yes, it should.  (The problem is large (>~100s), backward clock resets, not
skew.)  A clock reset causing "msg->clock_time < dbentry->stats_timestamp"
will usually also cause "msg->cutoff_time < dbentry->stats_timestamp".  Such
cases need the correction a few lines down.


I'll look into that. I have to admit I have a hard time reasoning about 
the code handling clock skew, so it might take some time, though.




The other thing needed here is to look through and update comments about
last_statrequests.  For example, this loop is dead code due to us writing
files as soon as we receive one inquiry:

/*
 * Find the last write request for this DB.  If it's older than the
 * request's cutoff time, update it; otherwise there's nothing to do.
 *
 * Note that if a request is found, we return early and skip the below
 * check for clock skew.  This is okay, since the only way for a DB
 * request to be present in the list is that we have been here since the
 * last write round.
 */
slist_foreach(iter, _statrequests) ...

I'm okay keeping the dead code for future flexibility, but the comment should
reflect that.


Yes, that's another thing that I'd like to look into. Essentially the 
problem is that we always process the inquiries one by one, so we never 
actually see a list with more than a single element. Correct?


I think the best way to address that is to peek is to first check how 
much data is in the UDP queue, and then fetching all of that before 
actually doing the writes. Peeking at the number of requests first (or 
even some reasonable hard-coded limit) should should prevent starving 
the inquirers in case of a steady stream or inquiries.


regards
Tomas

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-03-14 Thread Noah Misch
[Aside: your new mail editor is rewrapping lines in quoted material, and the
result is messy.  I have rerewrapped one paragraph below.]

On Mon, Mar 14, 2016 at 02:00:03AM +0100, Tomas Vondra wrote:
> On Sun, 2016-03-13 at 18:46 -0400, Noah Misch wrote:
> > I've not attempted to study the behavior on slow hardware.  Instead, my
> > report used stat-coalesce-v1.patch[1] to simulate slow writes.  (That
> > diagnostic patch no longer applies cleanly, so I'm attaching a rebased
> > version.  I've changed the patch name from "stat-coalesce" to
> > "slow-stat-simulate" to more-clearly distinguish it from the
> > "pgstat-coalesce" patch.)  Problems remain after applying your patch;
> > consider "VACUUM pg_am" behavior:
> > 
> > 9.2 w/ stat-coalesce-v1.patch:
> >   VACUUM returns in 3s, stats collector writes each file 1x over 3s
> > HEAD w/ slow-stat-simulate-v2.patch:
> >   VACUUM returns in 3s, stats collector writes each file 5x over 15s
> > HEAD w/ slow-stat-simulate-v2.patch and your patch:
> >   VACUUM returns in 10s, stats collector writes no files over 10s
> 
> Oh damn, the timestamp comparison in pgstat_recv_inquiry should be in
> the opposite direction. After fixing that "VACUUM pg_am" completes in 3
> seconds and writes each file just once.

That fixes things.  "make check" passes under an 8s stats write time.

> --- a/src/backend/postmaster/pgstat.c
> +++ b/src/backend/postmaster/pgstat.c
> @@ -4836,6 +4836,20 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
>   }
>  
>   /*
> +  * Ignore requests that are already resolved by the last write.
> +  *
> +  * We discard the queue of requests at the end of 
> pgstat_write_statsfiles(),
> +  * so the requests already waiting on the UDP socket at that moment 
> can't
> +  * be discarded in the previous loop.
> +  *
> +  * XXX Maybe this should also care about the clock skew, just like the
> +  * block a few lines down.

Yes, it should.  (The problem is large (>~100s), backward clock resets, not
skew.)  A clock reset causing "msg->clock_time < dbentry->stats_timestamp"
will usually also cause "msg->cutoff_time < dbentry->stats_timestamp".  Such
cases need the correction a few lines down.

The other thing needed here is to look through and update comments about
last_statrequests.  For example, this loop is dead code due to us writing
files as soon as we receive one inquiry:

/*
 * Find the last write request for this DB.  If it's older than the
 * request's cutoff time, update it; otherwise there's nothing to do.
 *
 * Note that if a request is found, we return early and skip the below
 * check for clock skew.  This is okay, since the only way for a DB
 * request to be present in the list is that we have been here since the
 * last write round.
 */
slist_foreach(iter, _statrequests) ...

I'm okay keeping the dead code for future flexibility, but the comment should
reflect that.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-03-13 Thread Tomas Vondra
On Sun, 2016-03-13 at 18:46 -0400, Noah Misch wrote:
> On Thu, Mar 03, 2016 at 06:08:07AM +0100, Tomas Vondra wrote:
> > 
> > On 02/03/2016 06:46 AM, Noah Misch wrote:
> > > 
> > > On Mon, Feb 01, 2016 at 07:03:45PM +0100, Tomas Vondra wrote:
> > > > 
> > > > On 12/22/2015 03:49 PM, Noah Misch wrote:
> > > > > 
> > > > > On Mon, Feb 18, 2013 at 06:19:12PM -0300, Alvaro Herrera
> > > > > wrote:
> > > > > > 
> > > > > > I have pushed it now.  Further testing, of course, is
> > > > > > always welcome.
> > > > > While investigating stats.sql buildfarm failures, mostly on
> > > > > animals
> > > > > axolotl and shearwater, I found that this patch (commit
> > > > > 187492b)
> > > > > inadvertently removed the collector's ability to coalesce
> > > > > inquiries.
> > > > > Every PGSTAT_MTYPE_INQUIRY received now causes one stats file
> > > > > write.
> > > > > Before, pgstat_recv_inquiry() did:
> > > > > 
> > > > >   if (msg->inquiry_time > last_statrequest)
> > > > >   last_statrequest = msg->inquiry_time;
> > > > > 
> > > > > and pgstat_write_statsfile() did:
> > > > > 
> > > > >   globalStats.stats_timestamp = GetCurrentTimestamp();
> > > > > ... (work of writing a stats file) ...
> > > > >   last_statwrite = globalStats.stats_timestamp;
> > > > >   last_statrequest = last_statwrite;
> > > > > 
> > > > > If the collector entered pgstat_write_statsfile() with more
> > > > > inquiries
> > > > > waiting in its socket receive buffer, it would ignore them as
> > > > > being too
> > > > > old once it finished the write and resumed message
> > > > > processing. Commit
> > > > > 187492b converted last_statrequest to a "last_statrequests"
> > > > > list that we
> > > > > wipe after each write.
> > So I've been looking at this today, and I think the attached patch
> > should do
> > the trick. I can't really verify it, because I've been unable to
> > reproduce the
> > non-coalescing - I presume it requires much slower system (axolotl
> > is RPi, not
> > sure about shearwater).
> > 
> > The patch simply checks DBEntry,stats_timestamp in
> > pgstat_recv_inquiry() and
> > ignores requests that are already resolved by the last write (maybe
> > this
> > should accept a file written up to PGSTAT_STAT_INTERVAL in the
> > past).
> > 
> > The required field is already in DBEntry (otherwise it'd be
> > impossible to
> > determine if backends need to send inquiries or not), so this is
> > pretty
> > trivial change. I can't think of a simpler patch.
> > 
> > Can you try applying the patch on a machine where the problem is
> > reproducible? I might have some RPi machines laying around (for
> > other
> > purposes).
> I've not attempted to study the behavior on slow hardware.  Instead,
> my report
> used stat-coalesce-v1.patch[1] to simulate slow writes.  (That
> diagnostic
> patch no longer applies cleanly, so I'm attaching a rebased
> version.  I've
> changed the patch name from "stat-coalesce" to "slow-stat-simulate"
> to
> more-clearly distinguish it from the "pgstat-coalesce"
> patch.)  Problems
> remain after applying your patch; consider "VACUUM pg_am" behavior:
> 
> 9.2 w/ stat-coalesce-v1.patch:
>   VACUUM returns in 3s, stats collector writes each file 1x over 3s
> HEAD w/ slow-stat-simulate-v2.patch:
>   VACUUM returns in 3s, stats collector writes each file 5x over 15s
> HEAD w/ slow-stat-simulate-v2.patch and your patch:
>   VACUUM returns in 10s, stats collector writes no files over 10s

Oh damn, the timestamp comparison in pgstat_recv_inquiry should be in
the opposite direction. After fixing that "VACUUM pg_am" completes in 3
seconds and writes each file just once.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 14afef6..e750d46 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4836,6 +4836,20 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 	}
 
 	/*
+	 * Ignore requests that are already resolved by the last write.
+	 *
+	 * We discard the queue of requests at the end of pgstat_write_statsfiles(),
+	 * so the requests already waiting on the UDP socket at that moment can't
+	 * be discarded in the previous loop.
+	 *
+	 * XXX Maybe this should also care about the clock skew, just like the
+	 * block a few lines down.
+	 */
+	dbentry = pgstat_get_db_entry(msg->databaseid, false);
+	if ((dbentry != NULL) && (msg->cutoff_time <= dbentry->stats_timestamp))
+		return;
+
+	/*
 	 * There's no request for this DB yet, so create one.
 	 */
 	newreq = palloc(sizeof(DBWriteRequest));
@@ -4852,7 +4866,6 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 	 * retreat in the system clock reading could otherwise cause us to neglect
 	 * to update the stats file for a long time.
 	 */
-	dbentry = pgstat_get_db_entry(msg->databaseid, false);
 	if ((dbentry != NULL) 

Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-03-13 Thread Noah Misch
On Thu, Mar 03, 2016 at 06:08:07AM +0100, Tomas Vondra wrote:
> On 02/03/2016 06:46 AM, Noah Misch wrote:
> >On Mon, Feb 01, 2016 at 07:03:45PM +0100, Tomas Vondra wrote:
> >>On 12/22/2015 03:49 PM, Noah Misch wrote:
> >>>On Mon, Feb 18, 2013 at 06:19:12PM -0300, Alvaro Herrera wrote:
> I have pushed it now.  Further testing, of course, is always welcome.
> >>>
> >>>While investigating stats.sql buildfarm failures, mostly on animals
> >>>axolotl and shearwater, I found that this patch (commit 187492b)
> >>>inadvertently removed the collector's ability to coalesce inquiries.
> >>>Every PGSTAT_MTYPE_INQUIRY received now causes one stats file write.
> >>>Before, pgstat_recv_inquiry() did:
> >>>
> >>>   if (msg->inquiry_time > last_statrequest)
> >>>   last_statrequest = msg->inquiry_time;
> >>>
> >>>and pgstat_write_statsfile() did:
> >>>
> >>>   globalStats.stats_timestamp = GetCurrentTimestamp();
> >>>... (work of writing a stats file) ...
> >>>   last_statwrite = globalStats.stats_timestamp;
> >>>   last_statrequest = last_statwrite;
> >>>
> >>>If the collector entered pgstat_write_statsfile() with more inquiries
> >>>waiting in its socket receive buffer, it would ignore them as being too
> >>>old once it finished the write and resumed message processing. Commit
> >>>187492b converted last_statrequest to a "last_statrequests" list that we
> >>>wipe after each write.
> 
> So I've been looking at this today, and I think the attached patch should do
> the trick. I can't really verify it, because I've been unable to reproduce the
> non-coalescing - I presume it requires much slower system (axolotl is RPi, not
> sure about shearwater).
> 
> The patch simply checks DBEntry,stats_timestamp in pgstat_recv_inquiry() and
> ignores requests that are already resolved by the last write (maybe this
> should accept a file written up to PGSTAT_STAT_INTERVAL in the past).
> 
> The required field is already in DBEntry (otherwise it'd be impossible to
> determine if backends need to send inquiries or not), so this is pretty
> trivial change. I can't think of a simpler patch.
> 
> Can you try applying the patch on a machine where the problem is
> reproducible? I might have some RPi machines laying around (for other
> purposes).

I've not attempted to study the behavior on slow hardware.  Instead, my report
used stat-coalesce-v1.patch[1] to simulate slow writes.  (That diagnostic
patch no longer applies cleanly, so I'm attaching a rebased version.  I've
changed the patch name from "stat-coalesce" to "slow-stat-simulate" to
more-clearly distinguish it from the "pgstat-coalesce" patch.)  Problems
remain after applying your patch; consider "VACUUM pg_am" behavior:

9.2 w/ stat-coalesce-v1.patch:
  VACUUM returns in 3s, stats collector writes each file 1x over 3s
HEAD w/ slow-stat-simulate-v2.patch:
  VACUUM returns in 3s, stats collector writes each file 5x over 15s
HEAD w/ slow-stat-simulate-v2.patch and your patch:
  VACUUM returns in 10s, stats collector writes no files over 10s


[1] 
http://www.postgresql.org/message-id/20151222144950.ga2553...@tornado.leadboat.com
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 14afef6..4308df2 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3444,6 +3444,7 @@ pgstat_send_bgwriter(void)
 NON_EXEC_STATIC void
 PgstatCollectorMain(int argc, char *argv[])
 {
+   unsigned total = 0;
int len;
PgStat_Msg  msg;
int wr;
@@ -3555,6 +3556,10 @@ PgstatCollectorMain(int argc, char *argv[])
 errmsg("could not read 
statistics message: %m")));
}
 
+   elog(LOG, "stats %d: %u + %u = %u",
+msg.msg_hdr.m_type, total, len, total + len);
+   total += len;
+
/*
 * We ignore messages that are smaller than our common 
header
 */
@@ -3947,6 +3952,13 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 */
fputc('E', fpout);
 
+   if (1)
+   {
+   PG_SETMASK();
+   pg_usleep(3 * 100L);
+   PG_SETMASK();
+   }
+
if (ferror(fpout))
{
ereport(LOG,
diff --git a/src/test/regress/expected/stats.out 
b/src/test/regress/expected/stats.out
index a811265..064cf9f 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -63,6 +63,9 @@ begin
 SELECT (n_tup_ins > 0) INTO updated3
   FROM pg_stat_user_tables WHERE relname='trunc_stats_test';
 
+raise log 'stats updated as of % snapshot: 1:% 2:% 3:%',
+  pg_stat_get_snapshot_timestamp(), updated1, updated2, updated3;
+
 exit when updated1 and updated2 and updated3;
 
 -- wait a little
diff --git a/src/test/regress/sql/stats.sql 

Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-03-02 Thread Tomas Vondra

Hi,

On 02/03/2016 06:46 AM, Noah Misch wrote:

On Mon, Feb 01, 2016 at 07:03:45PM +0100, Tomas Vondra wrote:

On 12/22/2015 03:49 PM, Noah Misch wrote:

On Mon, Feb 18, 2013 at 06:19:12PM -0300, Alvaro Herrera wrote:

I have pushed it now.  Further testing, of course, is always welcome.


While investigating stats.sql buildfarm failures, mostly on animals
axolotl and shearwater, I found that this patch (commit 187492b)
inadvertently removed the collector's ability to coalesce inquiries.
Every PGSTAT_MTYPE_INQUIRY received now causes one stats file write.
Before, pgstat_recv_inquiry() did:

if (msg->inquiry_time > last_statrequest)
last_statrequest = msg->inquiry_time;

and pgstat_write_statsfile() did:

globalStats.stats_timestamp = GetCurrentTimestamp();
... (work of writing a stats file) ...
last_statwrite = globalStats.stats_timestamp;
last_statrequest = last_statwrite;

If the collector entered pgstat_write_statsfile() with more inquiries
waiting in its socket receive buffer, it would ignore them as being too
old once it finished the write and resumed message processing. Commit
187492b converted last_statrequest to a "last_statrequests" list that we
wipe after each write.


So I've been looking at this today, and I think the attached patch should do
the trick. I can't really verify it, because I've been unable to reproduce the
non-coalescing - I presume it requires much slower system (axolotl is RPi, not
sure about shearwater).

The patch simply checks DBEntry,stats_timestamp in pgstat_recv_inquiry() and 
ignores requests that are already resolved by the last write (maybe this should 
accept a file written up to PGSTAT_STAT_INTERVAL in the past).


The required field is already in DBEntry (otherwise it'd be impossible to 
determine if backends need to send inquiries or not), so this is pretty trivial 
change. I can't think of a simpler patch.


Can you try applying the patch on a machine where the problem is reproducible? 
I might have some RPi machines laying around (for other purposes).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgstat-coalesce-v1.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-02-03 Thread Tomas Vondra

On 02/03/2016 06:46 AM, Noah Misch wrote:

On Mon, Feb 01, 2016 at 07:03:45PM +0100, Tomas Vondra wrote:

On 12/22/2015 03:49 PM, Noah Misch wrote:

...

If the collector entered pgstat_write_statsfile() with more
inquiries waiting in its socket receive buffer, it would ignore
them as being too old once it finished the write and resumed
message processing. Commit 187492b converted last_statrequest to
a "last_statrequests" list that we wipe after each write.


So essentially we remove the list of requests, and thus on the next
round we don't know the timestamp of the last request and write the
file again unnecessarily. Do I get that right?


Essentially right. Specifically, for each database, we must remember
the globalStats.stats_timestamp of the most recent write. It could be
okay to forget the last request timestamp. (I now doubt I picked the
best lines to quote, above.)


What if we instead kept the list but marked the requests as
'invalid' so that we know the timestamp? In that case we'd be able
to do pretty much exactly what the original code did (but at per-db
granularity).


The most natural translation of the old code would be to add a
write_time field to struct DBWriteRequest. One can infer "invalid"
from write_time and request_time. There are other reasonable designs,
though.


OK, makes sense. I'll look into that.




We'd have to cleanup the list once in a while not to grow
excessively large, but something like removing entries older than
PGSTAT_STAT_INTERVAL should be enough.


Specifically, if you assume the socket delivers messages in the order
sent, you may as well discard entries having write_time at least
PGSTAT_STAT_INTERVAL older than the most recent cutoff_time seen in a
PgStat_MsgInquiry. That delivery order assumption does not hold in
general, but I expect it's close enough for this purpose.


Agreed. If I get that right, it might result in some false negatives (in 
the sense that we'll remove a record too early, forcing us to write the 
database file again). But I expect that to be a rare case.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-02-02 Thread Noah Misch
On Mon, Feb 01, 2016 at 07:03:45PM +0100, Tomas Vondra wrote:
> On 12/22/2015 03:49 PM, Noah Misch wrote:
> >On Mon, Feb 18, 2013 at 06:19:12PM -0300, Alvaro Herrera wrote:
> >>I have pushed it now.  Further testing, of course, is always welcome.
> >
> >While investigating stats.sql buildfarm failures, mostly on animals axolotl
> >and shearwater, I found that this patch (commit 187492b) inadvertently 
> >removed
> >the collector's ability to coalesce inquiries.  Every PGSTAT_MTYPE_INQUIRY
> >received now causes one stats file write.  Before, pgstat_recv_inquiry() did:
> >
> > if (msg->inquiry_time > last_statrequest)
> > last_statrequest = msg->inquiry_time;
> >
> >and pgstat_write_statsfile() did:
> >
> > globalStats.stats_timestamp = GetCurrentTimestamp();
> >... (work of writing a stats file) ...
> > last_statwrite = globalStats.stats_timestamp;
> > last_statrequest = last_statwrite;
> >
> >If the collector entered pgstat_write_statsfile() with more inquiries waiting
> >in its socket receive buffer, it would ignore them as being too old once it
> >finished the write and resumed message processing.  Commit 187492b converted
> >last_statrequest to a "last_statrequests" list that we wipe after each write.
> 
> So essentially we remove the list of requests, and thus on the next round we
> don't know the timestamp of the last request and write the file again
> unnecessarily. Do I get that right?

Essentially right.  Specifically, for each database, we must remember the
globalStats.stats_timestamp of the most recent write.  It could be okay to
forget the last request timestamp.  (I now doubt I picked the best lines to
quote, above.)

> What if we instead kept the list but marked the requests as 'invalid' so
> that we know the timestamp? In that case we'd be able to do pretty much
> exactly what the original code did (but at per-db granularity).

The most natural translation of the old code would be to add a write_time
field to struct DBWriteRequest.  One can infer "invalid" from write_time and
request_time.  There are other reasonable designs, though.

> We'd have to cleanup the list once in a while not to grow excessively large,
> but something like removing entries older than PGSTAT_STAT_INTERVAL should
> be enough.

Specifically, if you assume the socket delivers messages in the order sent,
you may as well discard entries having write_time at least
PGSTAT_STAT_INTERVAL older than the most recent cutoff_time seen in a
PgStat_MsgInquiry.  That delivery order assumption does not hold in general,
but I expect it's close enough for this purpose.

> >As for how to fix this, the most direct translation of the old logic is to
> >retain last_statrequests entries that could help coalesce inquiries.I lean
> >toward that for an initial, back-patched fix.
> 
> That seems reasonable and I believe it's pretty much the idea I came up with
> above, right? Depending on how you define "entries that could help coalesce
> inquiries".

Yes.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-02-01 Thread Tomas Vondra

Hi,

On 12/22/2015 03:49 PM, Noah Misch wrote:

On Mon, Feb 18, 2013 at 06:19:12PM -0300, Alvaro Herrera wrote:

I have pushed it now.  Further testing, of course, is always welcome.


While investigating stats.sql buildfarm failures, mostly on animals axolotl
and shearwater, I found that this patch (commit 187492b) inadvertently removed
the collector's ability to coalesce inquiries.  Every PGSTAT_MTYPE_INQUIRY
received now causes one stats file write.  Before, pgstat_recv_inquiry() did:

if (msg->inquiry_time > last_statrequest)
last_statrequest = msg->inquiry_time;

and pgstat_write_statsfile() did:

globalStats.stats_timestamp = GetCurrentTimestamp();
... (work of writing a stats file) ...
last_statwrite = globalStats.stats_timestamp;
last_statrequest = last_statwrite;

If the collector entered pgstat_write_statsfile() with more inquiries waiting
in its socket receive buffer, it would ignore them as being too old once it
finished the write and resumed message processing.  Commit 187492b converted
last_statrequest to a "last_statrequests" list that we wipe after each write.


So essentially we remove the list of requests, and thus on the next 
round we don't know the timestamp of the last request and write the file 
again unnecessarily. Do I get that right?


What if we instead kept the list but marked the requests as 'invalid' so 
that we know the timestamp? In that case we'd be able to do pretty much 
exactly what the original code did (but at per-db granularity).


We'd have to cleanup the list once in a while not to grow excessively 
large, but something like removing entries older than 
PGSTAT_STAT_INTERVAL should be enough.


Actually, I think that was the idea when I wrote the patch, but 
apparently I got distracted and it did not make it into the code.




I modeled a machine with slow stats writes using the attached diagnostic patch
(not for commit).  It has pgstat_write_statsfiles() sleep just before renaming
the temporary file, and it logs each stats message received.  A three second
delay causes stats.sql to fail the way it did on shearwater[1] and on
axolotl[2].  Inquiries accumulate until the socket receive buffer overflows,
at which point the socket drops stats messages whose effects we later test
for.  The 3s delay makes us drop some 85% of installcheck stats messages.
Logs from a single VACUUM show receipt of five inquiries ("stats 1:") with 3s
between them:

24239 2015-12-10 04:21:03.439 GMT LOG:  connection authorized: user=nmisch 
database=test
24236 2015-12-10 04:21:03.454 GMT LOG:  stats 2: 1888 + 936 = 2824
24236 2015-12-10 04:21:03.454 GMT LOG:  stats 2: 2824 + 376 = 3200
24236 2015-12-10 04:21:03.454 GMT LOG:  stats 2: 3200 + 824 = 4024
24239 2015-12-10 04:21:03.455 GMT LOG:  statement: vacuum pg_class
24236 2015-12-10 04:21:03.455 GMT LOG:  stats 1: 4024 + 32 = 4056
24236 2015-12-10 04:21:06.458 GMT LOG:  stats 12: 4056 + 88 = 4144
24236 2015-12-10 04:21:06.458 GMT LOG:  stats 1: 4144 + 32 = 4176
24239 2015-12-10 04:21:06.463 GMT LOG:  disconnection: session time: 
0:00:03.025 user=nmisch database=test host=[local]
24236 2015-12-10 04:21:09.486 GMT LOG:  stats 1: 4176 + 32 = 4208
24236 2015-12-10 04:21:12.503 GMT LOG:  stats 1: 4208 + 32 = 4240
24236 2015-12-10 04:21:15.519 GMT LOG:  stats 1: 4240 + 32 = 4272
24236 2015-12-10 04:21:18.535 GMT LOG:  stats 9: 4272 + 48 = 4320
24236 2015-12-10 04:21:18.535 GMT LOG:  stats 2: 4320 + 936 = 5256
24236 2015-12-10 04:21:18.535 GMT LOG:  stats 2: 5256 + 376 = 5632
24236 2015-12-10 04:21:18.535 GMT LOG:  stats 2: 5632 + 264 = 5896
24236 2015-12-10 04:21:18.535 GMT LOG:  stats 12: 5896 + 88 = 5984


As for how to fix this, the most direct translation of the old logic is to
retain last_statrequests entries that could help coalesce inquiries.I lean
toward that for an initial, back-patched fix.


That seems reasonable and I believe it's pretty much the idea I came up 
with above, right? Depending on how you define "entries that could help 
coalesce inquiries".


> It would be good, though, to

process two closely-spaced, different-database inquiries in one
pgstat_write_statsfiles() call.  We do one-database writes and all-databases
writes, but we never write "1 < N < all" databases despite the code prepared
to do so.  I tried calling pgstat_write_statsfiles() only when the socket
receive buffer empties.  That's dead simple to implement and aggressively
coalesces inquiries (even a 45s sleep did not break stats.sql), but it starves
inquirers if the socket receive buffer stays full persistently.  Ideally, I'd
want to process inquiries when the buffer empties _or_ when the oldest inquiry
is X seconds old.  I don't have a more-specific design in mind, though.


That's a nice idea, but I agree that binding the coalescing to buffer 
like this seems like a pretty bad idea exactly because of the starving. 
What might work though is if we could look at how much data is there in 
the