[HACKERS] Avoiding overflow in timeout-related calculations

2012-11-18 Thread Tom Lane
The discussion of bug #7670 showed that what's happening there is that
if you specify a log_rotation_age of more than 25 days (2^31 msec),
WaitLatch will sometimes be passed a timeout of more than 2^31 msec,
leading to unportable behavior.  At least some kernels will return
EINVAL for that, and it's not very clear what will happen on others.

After some thought about this, I think the best thing to do is to tweak
syslogger.c to to clamp the requested sleep to INT_MAX msec.  The fact
that a couple of people have tried to set log_rotation_age to 30 days or
more suggests that it's useful, so reducing the GUC's upper limit isn't
a desirable fix.  This should be an easy change since the logic in that
loop will already behave correctly if it's woken up before the requested
rotation time.

I went looking for other timeout-related GUC variables that might have
overoptimistic upper limits, and found these cases:

PostAuthDelay is converted to microseconds, and therefore had better
have an upper bound of INT_MAX/100 seconds.  (Larger values would
work on machines with 64-bit long, but it doesn't seem worth
complicating the code to allow for that.)

contrib/auth_delay's auth_delay_milliseconds is likewise converted
to microseconds, so its max had better be INT_MAX/1000 msec.  (Likewise,
I don't see any value in supporting larger limits.)

XLogArchiveTimeout, although it's never multiplied by anything,
is compared to 
((int) (now - last_xlog_switch_time))
which means that its nominal maximum of INT_MAX is problematic:
if you actually set it to that it would be quite possible for
the system to miss the timeout occurrence if it didn't chance
to service the timeout signal in exactly the last second of the
interval.  (After that second, the above-quoted expression would
overflow and lead to the wrong comparison result.)  So it seems
to me we'd better back it off some.  I'm inclined to propose
dropping it to INT_MAX/2 seconds.

Comments, objections?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding overflow in timeout-related calculations

2012-11-18 Thread Andres Freund
On 2012-11-18 14:57:51 -0500, Tom Lane wrote:
 The discussion of bug #7670 showed that what's happening there is that
 if you specify a log_rotation_age of more than 25 days (2^31 msec),
 WaitLatch will sometimes be passed a timeout of more than 2^31 msec,
 leading to unportable behavior.  At least some kernels will return
 EINVAL for that, and it's not very clear what will happen on others.

 After some thought about this, I think the best thing to do is to tweak
 syslogger.c to to clamp the requested sleep to INT_MAX msec.  The fact
 that a couple of people have tried to set log_rotation_age to 30 days or
 more suggests that it's useful, so reducing the GUC's upper limit isn't
 a desirable fix.  This should be an easy change since the logic in that
 loop will already behave correctly if it's woken up before the requested
 rotation time.

Cool. Agreed.

 I went looking for other timeout-related GUC variables that might have
 overoptimistic upper limits, and found these cases:

 [sensible stuff]

Lowering the maximum of those seems sensible to me. Anybody using that
large value for those already had a problem even if it worked.

I think at least wal_sender_timeout and wal_receiver_timeout are also
problematic.

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding overflow in timeout-related calculations

2012-11-18 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 I think at least wal_sender_timeout and wal_receiver_timeout are also
 problematic.

I looked at those and didn't see a problem --- what are you worried
about exactly?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding overflow in timeout-related calculations

2012-11-18 Thread Andres Freund
On 2012-11-18 15:21:34 -0500, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  I think at least wal_sender_timeout and wal_receiver_timeout are also
  problematic.
 
 I looked at those and didn't see a problem --- what are you worried
 about exactly?

Forget it, too hungry to read the code properly...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers