Re: [HACKERS] Broken system timekeeping breaks the stats collector

2012-06-17 Thread Simon Riggs
On 17 June 2012 08:26, Tom Lane t...@sss.pgh.pa.us wrote:

 (1) In backend_read_statsfile, make an initial attempt to read the stats
 file and then read GetCurrentTimestamp after that.  If the local clock
 reading is less than the stats file's timestamp, we know that some sort
 of clock skew or glitch has happened, so force an inquiry message to be
 sent with the local timestamp.  But then accept the stats file anyway,
 since the skew might be small and harmless.  The reason for the forced
 inquiry message is to cause (2) to happen at the collector.

Fine, but please log this as a WARNING system time skew detected, so
we can actually see it has happened rather than just silently
accepting the situation.

It would be useful to document whether there are any other negative
effects from altering system time.

Perhaps we should do the same test at startup to see if the clock has
gone backwards then also. Perhaps we should also make note of any
major changes in time since last startup, which might help us detect
other forms of corruption.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken system timekeeping breaks the stats collector

2012-06-17 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 Fine, but please log this as a WARNING system time skew detected, so
 we can actually see it has happened rather than just silently
 accepting the situation.

I think elog(LOG) is more appropriate, same as we have for the existing
messages for related complaints.  No one backend is going to have a
complete view of the situation, and the collector itself has to use
LOG since it has no connected client at all.  So the postmaster log
is the place to look for evidence of clock trouble.

 Perhaps we should do the same test at startup to see if the clock has
 gone backwards then also.

Uh ... backwards from what?  And what difference would it make?  We
always force an immediate write of the stats file at startup anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken system timekeeping breaks the stats collector

2012-06-16 Thread Dickson S. Guedes
2012/6/16 Tom Lane t...@sss.pgh.pa.us:
[... cut ...]
 (1) In backend_read_statsfile, make an initial attempt to read the stats
 file and then read GetCurrentTimestamp after that.  If the local clock
 reading is less than the stats file's timestamp, we know that some sort
 of clock skew or glitch has happened, so force an inquiry message to be
 sent with the local timestamp.  But then accept the stats file anyway,
 since the skew might be small and harmless.  The reason for the forced
 inquiry message is to cause (2) to happen at the collector.

 (2) In pgstat_recv_inquiry, if the received inquiry_time is older than
 last_statwrite, we should suspect a clock glitch (though it might just
 indicate delayed message receipt).  In this case, do a fresh
 GetCurrentTimestamp call, and if the reading is less than
 last_statwrite, we know that the collector's time went backwards.
 To recover, reset these variables as we do at startup:
        last_statrequest = GetCurrentTimestamp();
        last_statwrite = last_statrequest - 1;
 to force an immediate write to happen with the new local time.

 (1) is basically free in terms of the amount of work done in non-broken
 cases, though it will require a few more lines of code.  (2) means
 adding some GetCurrentTimestamp calls that did not occur before, but
 hopefully these will be infrequent, since in the absence of clock
 glitches they would only happen when a backend's demand for a new stats
 file is generated before the collector starts to write a new stats file
 but not received till afterwards.

 Comments?  Anyone see a flaw in this design?  Or want to argue that
 we shouldn't do anything about such cases?

What happens when Daylight saving time ends? Or it doesn't matter in
this scenario?

regards
-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken system timekeeping breaks the stats collector

2012-06-16 Thread Tom Lane
Dickson S. Guedes lis...@guedesoft.net writes:
 What happens when Daylight saving time ends? Or it doesn't matter in
 this scenario?

Irrelevant, we're working in UTC-based timestamps.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers