Re: API explosion (Re: [RFC/RFT] calloutng)
On Wed, Dec 19, 2012 at 11:00:06AM +, Poul-Henning Kamp wrote: In message 50d192e8.3020...@freebsd.org, Alexander Motin writes: Linux uses 32.32 format in their eventtimers code. (And that is no accident, I know who they got the number from :-) But if at some point we want to be able to handle absolute wall time, [...] Then you have other problems, including but not limited to clock being stepped, leap-seconds, suspend/resume and frequency stability. If you want to support callouts of the type (At 14:00 UTC tomorrow) (disregarding the time-zone issue), you need to catch all significant changes to our UTC estimate and recalibrate your callout based on that. It is not obvious that we have applications for such an API that warrant the complexity. Either way, such a facility should be layered on top of the callout facility, which should always run in elapsed time[1] with no attention paid to what NTPD might do to the UTC estimate. POSIX specifies functions that assume such a facility exists, although applications may not care much if we implement them incorrectly. For example, pthread_mutex_timedlock() and sem_timedwait() shall time out when the CLOCK_REALTIME clock reaches the given value, and pthread_cond_timedwait() and clock_nanosleep() (with TIMER_ABSTIME) shall time out when the specified clock reaches the given value. So summary: 32.32 is the right format. Poul-Henning [1] Notice that elapsed time needs a firm definition with respect to suspend/resume, and that this decision has big implications for the API use and code duplication. I think it prudent to specify a flag to callouts, to tell what should happen on suspend/resume, something like: SR_CANCEL /* Cancel the callout on S/R */ /* no flag* /* Toll this callout only when system is running */ SR_IGNORE /* Toll suspended time from callout */ If you get this right, callouts from device drivers will just DTRT, if you get it wrong, all device drivers will need boilerplate code to handle S/R Userland could get access to this via CLOCK_REALTIME vs CLOCK_MONOTONIC vs CLOCK_UPTIME. -- Jilles Tjoelker ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
In message 2012125025.ga46...@stack.nl, Jilles Tjoelker writes: Either way, such a facility should be layered on top of the callout facility, which should always run in elapsed time[1] with no attention paid to what NTPD might do to the UTC estimate. POSIX specifies functions that assume such a facility exists, although applications may not care much if we implement them incorrectly. It should still be implemented op top of callouts, not as part of: it is an entirely different thing to try to do right. I think it prudent to specify a flag to callouts, to tell what should happen on suspend/resume, something like: SR_CANCEL /* Cancel the callout on S/R */ /* no flag* /* Toll this callout only when system is running */ SR_IGNORE /* Toll suspended time from callout */ If you get this right, callouts from device drivers will just DTRT, if you get it wrong, all device drivers will need boilerplate code to handle S/R Userland could get access to this via CLOCK_REALTIME vs CLOCK_MONOTONIC vs CLOCK_UPTIME. I have _no_ idea what you are trying to say here... -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
On 19.12.2012 01:37, Ian Lepore wrote: On Wed, 2012-12-19 at 00:29 +0100, Luigi Rizzo wrote: On Tue, Dec 18, 2012 at 04:27:45PM -0700, Ian Lepore wrote: On Tue, 2012-12-18 at 23:58 +0100, Luigi Rizzo wrote: [top posting for readability; in summary we were discussing the new callout API trying to avoid an explosion of methods and arguments while at the same time supporting the old API and the new one] (I am also Cc-ing phk as he might have better insight on the topic). I think the patch you propose is a step in the right direction, but i still remain concerned by having to pass two bintimes (by reference, but they should really go by value) and one 'ticks' value to all these functions. I am also dubious that we need a full 128 bits to specify the 'precision': there would be absolutely no loss of functionality if we decided to specify the precision in powers of 2, so a precision 'k' (signed) means 2^k seconds. This way 8 bits are enough to represent any precision we want. ... I'm not so sure about the 2^k precision. You speak of seconds, but I would be worrying about sub-second precision in my work. It would typical to want a 500uS timeout but be willing to late by up to 250uS if i said k is signed so negative values represent fractions of a second. 2^-128 is pretty short :) Ahh, I missed that. Good enough then! Hmmm, if that ideas survives further review, then could precision be encoded in 8 bits of the flags, eliminating another parm? Those who tracked the branch could see that actually was our first approach to handle precision. Unfortunately, it appeared not very convenient when you need to convert relative precision in percents or fraction of interval to the absolute precision in power of 2. We were worried that using some ffsll() for it can be inconvenient or expensive. But since we are now talking about passing relative bintime as an argument, that may be more viable option. I'll make another try. Thanks for the input. Pity it didn't happen couple of months ago. -- Alexander Motin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
In message 1355873265.1198.183.ca...@revolution.hippie.lan, Ian Lepore writes : On Tue, 2012-12-18 at 23:58 +0100, Luigi Rizzo wrote: I'm not so sure about the 2^k precision. You speak of seconds, but I would be worrying about sub-second precision in my work. It is a bad idea, and it is physically pointless, given the stabilities of the timebases available for computers in general. Please just take my word as a time-nut, and use a 32.32 binary format in seconds (see previous email) and be done with it. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
On Wed, Dec 19, 2012 at 1:54 AM, Poul-Henning Kamp p...@phk.freebsd.dk wrote: In message 1355873265.1198.183.ca...@revolution.hippie.lan, Ian Lepore writes : On Tue, 2012-12-18 at 23:58 +0100, Luigi Rizzo wrote: I'm not so sure about the 2^k precision. You speak of seconds, but I would be worrying about sub-second precision in my work. It is a bad idea, and it is physically pointless, given the stabilities of the timebases available for computers in general. Please just take my word as a time-nut, and use a 32.32 binary format in seconds (see previous email) and be done with it. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. Right now -- the precision is specified in 'bintime', which is a binary number. It's not 32.32, it's 32.64 or 64.64 depending on the size of time_t in the specific platform. I do not really think it worth to create another structure for handling time (e.g. struct bintime32), as it will lead to code duplication for all the basic conversion/math operation. On the other hand, 32.32 may not be enough in the long future. What do you think about that? Thanks, Davide ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
On 19.12.2012 12:03, Davide Italiano wrote: On Wed, Dec 19, 2012 at 1:54 AM, Poul-Henning Kamp p...@phk.freebsd.dk wrote: In message 1355873265.1198.183.ca...@revolution.hippie.lan, Ian Lepore writes : On Tue, 2012-12-18 at 23:58 +0100, Luigi Rizzo wrote: I'm not so sure about the 2^k precision. You speak of seconds, but I would be worrying about sub-second precision in my work. It is a bad idea, and it is physically pointless, given the stabilities of the timebases available for computers in general. Please just take my word as a time-nut, and use a 32.32 binary format in seconds (see previous email) and be done with it. Right now -- the precision is specified in 'bintime', which is a binary number. It's not 32.32, it's 32.64 or 64.64 depending on the size of time_t in the specific platform. I do not really think it worth to create another structure for handling time (e.g. struct bintime32), as it will lead to code duplication for all the basic conversion/math operation. On the other hand, 32.32 may not be enough in the long future. What do you think about that? Linux uses 32.32 format in their eventtimers code. Respecting that now we have no any timer hardware with frequency about 4GHz, that precision is probably sufficient. But if at some point we want to be able to handle absolute wall time, then 32bit integer part may be not enough. Then we return to the question: how many different data types do we want to see in one subsystem? Sure, using single 64bit value would be much easier then struct bintime from many perspectives, but what's about the edge cases? -- Alexander Motin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
In message CACYV=-eg542ihm9kfujpvczzra4tqepebva8rzt1yohncgf...@mail.gmail.com , Davide Italiano writes: Right now -- the precision is specified in 'bintime', which is a binary number. It's not 32.32, it's 32.64 or 64.64 depending on the size of time_t in the specific platform. And that is way overkill for specifying a callout, at best your clock has short term stabilities approaching 1e-8, but likely as bad as 1e-6. (The reason why bintime is important for timekeeping is that we accumulate timeintervals approx 1e3 times a second, so the rounding error has to be much smaller than the short term stability in order to not dominate) I do not really think it worth to create another structure for handling time (e.g. struct bintime32), as it will lead to code No, that was exactly my point: It should be an integer so that comparisons and arithmetic is trivial. A 32.32 format fits nicely into a int64_t which is readily available in the language. As I said in my previous email: typedef dur_t int64_t;/* signed for bug catching */ #define DURSEC ((dur_t)1 32) #define DURMIN (DURSEC * 60) #define DURMSEC (DURSEC / 1000) #define DURUSEC (DURSEC / 1000) #define DURNSEC (DURSEC / 100) (Bikeshed the names at your convenience) Then you can say callout_foo(34 * DURSEC) callout_foo(2400 * DURMSEC) or callout_foo(500 * DURNSEC) With this format you can specify callouts 68 years into the future with quarter nanosecond resolution, and you can trivially and efficiently compare dur_t's with if (d1 d2) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
In message 50d192e8.3020...@freebsd.org, Alexander Motin writes: Linux uses 32.32 format in their eventtimers code. (And that is no accident, I know who they got the number from :-) But if at some point we want to be able to handle absolute wall time, [...] Then you have other problems, including but not limited to clock being stepped, leap-seconds, suspend/resume and frequency stability. If you want to support callouts of the type (At 14:00 UTC tomorrow) (disregarding the time-zone issue), you need to catch all significant changes to our UTC estimate and recalibrate your callout based on that. It is not obvious that we have applications for such an API that warrant the complexity. Either way, such a facility should be layered on top of the callout facility, which should always run in elapsed time[1] with no attention paid to what NTPD might do to the UTC estimate. So summary: 32.32 is the right format. Poul-Henning [1] Notice that elapsed time needs a firm definition with respect to suspend/resume, and that this decision has big implications for the API use and code duplication. I think it prudent to specify a flag to callouts, to tell what should happen on suspend/resume, something like: SR_CANCEL /* Cancel the callout on S/R */ /* no flag* /* Toll this callout only when system is running */ SR_IGNORE /* Toll suspended time from callout */ If you get this right, callouts from device drivers will just DTRT, if you get it wrong, all device drivers will need boilerplate code to handle S/R -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
On Wed, 19 Dec 2012, Poul-Henning Kamp wrote: In message CACYV=-eg542ihm9kfujpvczzra4tqepebva8rzt1yohncgf...@mail.gmail.com , Davide Italiano writes: Right now -- the precision is specified in 'bintime', which is a binary number. It's not 32.32, it's 32.64 or 64.64 depending on the size of time_t in the specific platform. And that is way overkill for specifying a callout, at best your clock has short term stabilities approaching 1e-8, but likely as bad as 1e-6. So you always agreed with me that bintimes are unsuitable for almost everything, and especially unsuitable for timeouts? :-) (The reason why bintime is important for timekeeping is that we accumulate timeintervals approx 1e3 times a second, so the rounding error has to be much smaller than the short term stability in order to not dominate) bintimes are not unsuitable for timekeeping, but they a painful to use for other APIs. You have to either put bintimes in layers in the other APIs, or convert them to a more suitable format, and there is a problem placing the conversion at points where it is efficient. This thread seems to be mostly about putting the conversion in wrong places. My original objection was about using bintimes for almost everything at the implementation level. I do not really think it worth to create another structure for handling time (e.g. struct bintime32), as it will lead to code No, that was exactly my point: It should be an integer so that comparisons and arithmetic is trivial. A 32.32 format fits nicely into a int64_t which is readily available in the language. I would have tried a 32 bit format with a variable named 'ticks'. Something like: - ticks = 0. Same meaning as now. No changes in ABIs or APIs to use this. The tick period would be constant but for virtual ticks and not too small. hz = 1000 now makes the period too small, and not a power of 2. So make the period 1/128 second. This gives a 1.24.7 binary format. 2**24 seconds is 194 days. - ticks 0. The 31 value bits are now a cookie (descriptor) referring to a bintime or whatever. This case should rarely be used. I don't like it that a tickless kernel, which is needed mainly for power saving, has expanded into complications to support short timeouts which should rarely be used. As I said in my previous email: typedef dur_t int64_t;/* signed for bug catching */ #define DURSEC ((dur_t)1 32) #define DURMIN (DURSEC * 60) #define DURMSEC (DURSEC / 1000) #define DURUSEC (DURSEC / 1000) #define DURNSEC (DURSEC / 100) (Bikeshed the names at your convenience) Then you can say callout_foo(34 * DURSEC) callout_foo(2400 * DURMSEC) or callout_foo(500 * DURNSEC) Constructing the cookie for my special case would not be so easy. With this format you can specify callouts 68 years into the future with quarter nanosecond resolution, and you can trivially and efficiently compare dur_t's with if (d1 d2) This would make a better general format than timevals, timespecs and of course bintimes :-). It is a bit wasteful for timeouts since its extremes are rarely used. Malicious and broken callers can still cause overflow at 68 years, so you have to check for it and handle it. The limit of 194 days is just as good for timeouts. Bruce ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
In message 20121219221518.e1...@besplex.bde.org, Bruce Evans writes: With this format you can specify callouts 68 years into the future with quarter nanosecond resolution, and you can trivially and efficiently compare dur_t's with if (d1 d2) This would make a better general format than timevals, timespecs and of course bintimes :-). Except that for absolute timescales, we're running out of the 32 bits integer part. Bintimes is a necessary superset of the 32.32 which tries to work around the necessary but missing int96_t or int128_t[1]. Poul-Henning [1] A good addition to C would be a general multi-word integer type where you could ask for any int%d_t or uint%d_t you cared for, and have the compiler DTRT. In difference from using a multiword-library, this would still give these types their natural integer behaviour. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
On Wed, Dec 19, 2012 at 4:18 AM, Bruce Evans b...@optusnet.com.au wrote: On Wed, 19 Dec 2012, Poul-Henning Kamp wrote: In message CACYV=-eg542ihm9kfujpvczzra4tqepebva8rzt1yohncgf...@mail.gmail.com , Davide Italiano writes: Right now -- the precision is specified in 'bintime', which is a binary number. It's not 32.32, it's 32.64 or 64.64 depending on the size of time_t in the specific platform. And that is way overkill for specifying a callout, at best your clock has short term stabilities approaching 1e-8, but likely as bad as 1e-6. So you always agreed with me that bintimes are unsuitable for almost everything, and especially unsuitable for timeouts? :-) (The reason why bintime is important for timekeeping is that we accumulate timeintervals approx 1e3 times a second, so the rounding error has to be much smaller than the short term stability in order to not dominate) bintimes are not unsuitable for timekeeping, but they a painful to use for other APIs. You have to either put bintimes in layers in the other APIs, or convert them to a more suitable format, and there is a problem placing the conversion at points where it is efficient. This thread seems to be mostly about putting the conversion in wrong places. My original objection was about using bintimes for almost everything at the implementation level. I do not really think it worth to create another structure for handling time (e.g. struct bintime32), as it will lead to code No, that was exactly my point: It should be an integer so that comparisons and arithmetic is trivial. A 32.32 format fits nicely into a int64_t which is readily available in the language. I would have tried a 32 bit format with a variable named 'ticks'. Something like: - ticks = 0. Same meaning as now. No changes in ABIs or APIs to use this. The tick period would be constant but for virtual ticks and not too small. hz = 1000 now makes the period too small, and not a power of 2. So make the period 1/128 second. This gives a 1.24.7 binary format. 2**24 seconds is 194 days. - ticks 0. The 31 value bits are now a cookie (descriptor) referring to a bintime or whatever. This case should rarely be used. I don't like it that a tickless kernel, which is needed mainly for power saving, has expanded into complications to support short timeouts which should rarely be used. Bruce, I don't really agree with this. The data addressed by cookie should be still stored somewhere, and KBI will result broken. This, indeed, is not real problem as long as current calloutng code heavily breaks KBI, but if that was your point, I don't see how your proposed change could help. As I said in my previous email: typedef dur_t int64_t;/* signed for bug catching */ #define DURSEC ((dur_t)1 32) #define DURMIN (DURSEC * 60) #define DURMSEC (DURSEC / 1000) #define DURUSEC (DURSEC / 1000) #define DURNSEC (DURSEC / 100) (Bikeshed the names at your convenience) Then you can say callout_foo(34 * DURSEC) callout_foo(2400 * DURMSEC) or callout_foo(500 * DURNSEC) Constructing the cookie for my special case would not be so easy. With this format you can specify callouts 68 years into the future with quarter nanosecond resolution, and you can trivially and efficiently compare dur_t's with if (d1 d2) This would make a better general format than timevals, timespecs and of course bintimes :-). It is a bit wasteful for timeouts since its extremes are rarely used. Malicious and broken callers can still cause overflow at 68 years, so you have to check for it and handle it. The limit of 194 days is just as good for timeouts. Bruce I think the phk's proposal is better. About your overflow objection, I think is really unlikely to happen, but better safe than sorry. Thanks Davide ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
In message 20121220005706.i1...@besplex.bde.org, Bruce Evans writes: On Wed, 19 Dec 2012, Poul-Henning Kamp wrote: Except that for absolute timescales, we're running out of the 32 bits integer part. Except 32 bit time_t works until 2106 if it is unsigned. That's sort of not an option. The real problem was that time_t was not defined as a floating point number. [1] A good addition to C would be a general multi-word integer type where you could ask for any int%d_t or uint%d_t you cared for, and have the compiler DTRT. In difference from using a multiword-library, this would still give these types their natural integer behaviour. That would be convenient, but bad for efficiency if it were actually used much. You can say that about anything but CPU-native operations, and I doubt it would be as inefficient as struct bintime, which does not have access to the carry bit. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
On Wed, 19 Dec 2012, Poul-Henning Kamp wrote: In message 20121219221518.e1...@besplex.bde.org, Bruce Evans writes: With this format you can specify callouts 68 years into the future with quarter nanosecond resolution, and you can trivially and efficiently compare dur_t's with if (d1 d2) This would make a better general format than timevals, timespecs and of course bintimes :-). Except that for absolute timescales, we're running out of the 32 bits integer part. Except 32 bit time_t works until 2106 if it is unsigned. Bintimes is a necessary superset of the 32.32 which tries to work around the necessary but missing int96_t or int128_t[1]. [1] A good addition to C would be a general multi-word integer type where you could ask for any int%d_t or uint%d_t you cared for, and have the compiler DTRT. In difference from using a multiword-library, this would still give these types their natural integer behaviour. That would be convenient, but bad for efficiency if it were actually used much. Bruce ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
On Wed, Dec 19, 2012 at 10:51:48AM +, Poul-Henning Kamp wrote: In message CACYV=-eg542ihm9kfujpvczzra4tqepebva8rzt1yohncgf...@mail.gmail.com , Davide Italiano writes: Right now -- the precision is specified in 'bintime', which is a binary number. It's not 32.32, it's 32.64 or 64.64 depending on the size of time_t in the specific platform. And that is way overkill for specifying a callout, at best your clock has short term stabilities approaching 1e-8, but likely as bad as 1e-6. (The reason why bintime is important for timekeeping is that we accumulate timeintervals approx 1e3 times a second, so the rounding error has to be much smaller than the short term stability in order to not dominate) I do not really think it worth to create another structure for handling time (e.g. struct bintime32), as it will lead to code No, that was exactly my point: It should be an integer so that comparisons and arithmetic is trivial. A 32.32 format fits nicely into a int64_t which is readily available in the language. As I said in my previous email: typedef dur_t int64_t;/* signed for bug catching */ #define DURSEC ((dur_t)1 32) #define DURMIN (DURSEC * 60) #define DURMSEC (DURSEC / 1000) #define DURUSEC (DURSEC / 1000) #define DURNSEC (DURSEC / 100) (Bikeshed the names at your convenience) Then you can say callout_foo(34 * DURSEC) callout_foo(2400 * DURMSEC) or callout_foo(500 * DURNSEC) only thing, we must be careful with the parentheses For instance, in your macro, DURNSEC evaluates to 0 and so does any multiple of it. We should define them as #define DURNSEC DURSEC / 100 ... so DURNSEC is still 0 and 500*DURNSEC gives 214 I am curious that Bruce did not mention this :) (btw the typedef is swapped, should be typedef int64_t dur_t) cheers luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
In message 20121219150809.ga98...@onelab2.iet.unipi.it, Luigi Rizzo writes: typedef dur_t int64_t;/* signed for bug catching */ #define DURSEC ((dur_t)1 32) #define DURMIN (DURSEC * 60) #define DURMSEC (DURSEC / 1000) #define DURUSEC (DURSEC / 1000) #define DURNSEC (DURSEC / 100) only thing, we must be careful with the parentheses Actually, it's more impportant to be careful with zeros, if you adjust the above to the correct number of zeros, DURNSEC is 4, which is within seven percent of the correct value. (btw the typedef is swapped, should be typedef int64_t dur_t) Yes, I'm trying to find out of people even listen to me :-) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
On 19.12.2012 16:20, Bruce Evans wrote: On Wed, 19 Dec 2012, Davide Italiano wrote: On Wed, Dec 19, 2012 at 4:18 AM, Bruce Evans b...@optusnet.com.au wrote: I would have tried a 32 bit format with a variable named 'ticks'. Something like: - ticks = 0. Same meaning as now. No changes in ABIs or APIs to use this. The tick period would be constant but for virtual ticks and not too small. hz = 1000 now makes the period too small, and not a power of 2. So make the period 1/128 second. This gives a 1.24.7 binary format. 2**24 seconds is 194 days. - ticks 0. The 31 value bits are now a cookie (descriptor) referring to a bintime or whatever. This case should rarely be used. I don't like it that a tickless kernel, which is needed mainly for power saving, has expanded into complications to support short timeouts which should rarely be used. Bruce, I don't really agree with this. The data addressed by cookie should be still stored somewhere, and KBI will result broken. This, indeed, is not real problem as long as current calloutng code heavily breaks KBI, but if that was your point, I don't see how your proposed change could help. In the old API, it is an error to pass ticks 0, so only broken old callers are affected. Of course, if there are any then it would be hard to detect their garbage cookies. Anywy, it's too later to change to this, and maybe also to a 32.32 format. It would be late to change this after committing. I would definitely like it to be done earlier to not redo all the tests, but I think we could convert callout and eventtimers code to 32.32 format in several days. The only two questions are: do we really want it (won't there be any reasons to regret about it) and how do we want it to look? For the first question my personal showstopper since eventtimers creation always was the wish to keep consistency. But benefits of 32.32 format are clear, and if there are more votes for it, let's consider. If now it will be decided that full range will never be useful for callout subsystem, then it is obviously not needed for eventtimers also. About the second question, what do you think about such prototypes: typedef int64_t sbintime_t static __inline sbintime_t bintime_shrink(struct bintime *bt) {} static __inline struct bintime bintime_expand(sbintime_t sbt) {} ... int callout_reset_bt(struct callout *, sbintime_t sbt, sbintime_t pr, void (*fn)(void *), void *arg, int flags); , where pr used only for absolute precision, and flags includes: direct execution, absolute/relative time in argument, relative precision in case of relative sbt, flag for aligning to hardclock() to emulate legacy behavior, and potentially flags for reaction on suspend/resume. Another option is to move absolute precision also to flags, using log2() representation, as we tried and as was proposed before. With possibility to use precise relative time there will be few cases requiring absolute value of precision, that should depend on period. Then there will be no extra arguments in the most usual cases. Wrapper for existing API compatibility will look just like this: #define callout_reset(c, ticks, fn, arg)\ callout_reset_bt(c, ticks2sbintime(ticks), -1, \ (fn), (arg), C_HARDCLOCK) -- Alexander Motin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
dropping phk _AT_ onelab2 _DOT_ something from CC as long as it doesn't seem a valid mail address and I'm annoyed mails bounce back. On Wed, Dec 19, 2012 at 6:20 AM, Bruce Evans b...@optusnet.com.au wrote: On Wed, 19 Dec 2012, Davide Italiano wrote: On Wed, Dec 19, 2012 at 4:18 AM, Bruce Evans b...@optusnet.com.au wrote: I would have tried a 32 bit format with a variable named 'ticks'. Something like: - ticks = 0. Same meaning as now. No changes in ABIs or APIs to use this. The tick period would be constant but for virtual ticks and not too small. hz = 1000 now makes the period too small, and not a power of 2. So make the period 1/128 second. This gives a 1.24.7 binary format. 2**24 seconds is 194 days. - ticks 0. The 31 value bits are now a cookie (descriptor) referring to a bintime or whatever. This case should rarely be used. I don't like it that a tickless kernel, which is needed mainly for power saving, has expanded into complications to support short timeouts which should rarely be used. Bruce, I don't really agree with this. The data addressed by cookie should be still stored somewhere, and KBI will result broken. This, indeed, is not real problem as long as current calloutng code heavily breaks KBI, but if that was your point, I don't see how your proposed change could help. In the old API, it is an error to pass ticks 0, so only broken old callers are affected. Of course, if there are any then it would be hard to detect their garbage cookies. Anywy, it's too later to change to this, and maybe also to a 32.32 format. [32.32 format] It's not too late. What I'd like to do, right now people got interested in the problem is agreeing on the interface used. Following this thread, as I've already discussed to mav@, we would like to decide what of the two is better: - specify precision as additional argument (as we're doing right now) - use 'flags' argument If we allow time argument to be relative and not absolute, as suggested by luigi@, we can easily use relative precision where we had to use ffl() before. This would make a better general format than timevals, timespecs and of course bintimes :-). It is a bit wasteful for timeouts since its extremes are rarely used. Malicious and broken callers can still cause overflow at 68 years, so you have to check for it and handle it. The limit of 194 days is just as good for timeouts. I think the phk's proposal is better. About your overflow objection, I think is really unlikely to happen, but better safe than sorry. It's very easy for applications to cause kernel overflow using valid syscall args like tv_sec = TIME_T_MAX for a relative time in nanosleep(). Adding TIME_T_MAX to the current time in seconds overflow for all current times except for the first second after the Epoch. There is no difference between the overflow for 32-bit and 64-bit time_t's for this. This is now mostly handled so that the behaviour is harmless although wrong. E.g., the timeout might become negative, and then since it is not a cookie it is silently replaced by a timeout of 1 tick. In nanosleep(), IIRC there are further overflows that result in returning early instead of retrying the 1-tick timeouts endlessly. Bruce Thanks, Davide ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
On Wed, 19 Dec 2012, Davide Italiano wrote: On Wed, Dec 19, 2012 at 4:18 AM, Bruce Evans b...@optusnet.com.au wrote: I would have tried a 32 bit format with a variable named 'ticks'. Something like: - ticks = 0. Same meaning as now. No changes in ABIs or APIs to use this. The tick period would be constant but for virtual ticks and not too small. hz = 1000 now makes the period too small, and not a power of 2. So make the period 1/128 second. This gives a 1.24.7 binary format. 2**24 seconds is 194 days. - ticks 0. The 31 value bits are now a cookie (descriptor) referring to a bintime or whatever. This case should rarely be used. I don't like it that a tickless kernel, which is needed mainly for power saving, has expanded into complications to support short timeouts which should rarely be used. Bruce, I don't really agree with this. The data addressed by cookie should be still stored somewhere, and KBI will result broken. This, indeed, is not real problem as long as current calloutng code heavily breaks KBI, but if that was your point, I don't see how your proposed change could help. In the old API, it is an error to pass ticks 0, so only broken old callers are affected. Of course, if there are any then it would be hard to detect their garbage cookies. Anywy, it's too later to change to this, and maybe also to a 32.32 format. [32.32 format] This would make a better general format than timevals, timespecs and of course bintimes :-). It is a bit wasteful for timeouts since its extremes are rarely used. Malicious and broken callers can still cause overflow at 68 years, so you have to check for it and handle it. The limit of 194 days is just as good for timeouts. I think the phk's proposal is better. About your overflow objection, I think is really unlikely to happen, but better safe than sorry. It's very easy for applications to cause kernel overflow using valid syscall args like tv_sec = TIME_T_MAX for a relative time in nanosleep(). Adding TIME_T_MAX to the current time in seconds overflow for all current times except for the first second after the Epoch. There is no difference between the overflow for 32-bit and 64-bit time_t's for this. This is now mostly handled so that the behaviour is harmless although wrong. E.g., the timeout might become negative, and then since it is not a cookie it is silently replaced by a timeout of 1 tick. In nanosleep(), IIRC there are further overflows that result in returning early instead of retrying the 1-tick timeouts endlessly. Bruce ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
On Wed, 19 Dec 2012, Poul-Henning Kamp wrote: In message 20121220005706.i1...@besplex.bde.org, Bruce Evans writes: On Wed, 19 Dec 2012, Poul-Henning Kamp wrote: Except that for absolute timescales, we're running out of the 32 bits integer part. Except 32 bit time_t works until 2106 if it is unsigned. That's sort of not an option. I think it is. It is just probably not necessary since 32-bit systems will go away before 2038. The real problem was that time_t was not defined as a floating point number. That would be convenient too, but bad for efficiency on some systems. Kernels might not be able to use it, and then would have to use an alternative representation, which they should have done all along. [1] A good addition to C would be a general multi-word integer type where you could ask for any int%d_t or uint%d_t you cared for, and have the compiler DTRT. In difference from using a multiword-library, this would still give these types their natural integer behaviour. That would be convenient, but bad for efficiency if it were actually used much. You can say that about anything but CPU-native operations, and I doubt it would be as inefficient as struct bintime, which does not have access to the carry bit. Yes, I would say that about non-native. It goes against the spirit of C. OTOH, compilers are getting closer to giving full access to the carry bit. I just checked what clang does in a home-made 128-bit add function: % static void __noinline % uadd(struct u *xup, struct u *yup) % { % unsigned long long t; % % t = xup-w[0] + yup-w[0]; % if (t xup-w[0]) % xup-w[1]++; % xup-w[0] = t; % xup-w[1] += yup-w[1]; % } % % .align 16, 0x90 % .type uadd,@function % uadd: # @uadd % .cfi_startproc % # BB#0: # %entry % movq(%rdi), %rcx % movq8(%rdi), %rax % addq(%rsi), %rcx gcc generates an additional cmpq instruction here. % jae .LBB2_2 clang uses the carry bit set by the first addition to avoid the comparison, but still branches. % # BB#1: # %if.then % incq%rax % movq%rax, 8(%rdi) This adds 1 explicitly instead of using adcq, but this is the slow path. % .LBB2_2:# %if.end % movq%rcx, (%rdi) % addq8(%rsi), %rax This is as efficient as possible except for the extra branch, and the branch is almost perfectly predictable. % movq%rax, 8(%rdi) % ret % .Ltmp22: % .size uadd, .Ltmp22-uadd % .cfi_endproc Bruce ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
In message 50d1e0d8.9070...@freebsd.org, Alexander Motin writes: It would be late to change this after committing. I would definitely like it to be done earlier to not redo all the tests, but I think we could convert callout and eventtimers code to 32.32 format in several days. The only two questions are: do we really want it (won't there be any reasons to regret about it) and how do we want it to look? As much as it pains me to raise this point, we would regret it if we did not use 32.32, because Linux already went that way. As much as there is to be said for doing things right, we should also try to avoid pointless incompatibilities which will make it needlessly hard for people to move code, particular device drivers forth and back. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
I finally remembered to remove the .it phk :-). On Wed, 19 Dec 2012, Luigi Rizzo wrote: On Wed, Dec 19, 2012 at 10:51:48AM +, Poul-Henning Kamp wrote: ... As I said in my previous email: typedef dur_t int64_t;/* signed for bug catching */ #define DURSEC ((dur_t)1 32) #define DURMIN (DURSEC * 60) #define DURMSEC (DURSEC / 1000) #define DURUSEC (DURSEC / 1000) #define DURNSEC (DURSEC / 100) (Bikeshed the names at your convenience) Then you can say callout_foo(34 * DURSEC) callout_foo(2400 * DURMSEC) or callout_foo(500 * DURNSEC) only thing, we must be careful with the parentheses For instance, in your macro, DURNSEC evaluates to 0 and so does any multiple of it. We should define them as #define DURNSEC DURSEC / 100 ... so DURNSEC is still 0 and 500*DURNSEC gives 214 I am curious that Bruce did not mention this :) Er, he was careful. DURNSEC gives 4, not 0. This is not very accurate, but probably good enough. Your version without parentheses is not so careful and depends on a magic order of operations and no overflow from this. E.g.: 500*DURNSEC = 500*DURSEC / 10 = 500*((dur_t)1 32) / 10 This is very accurate and happens not to overflow. But 5 seconds represented a little strangely in nanoseconds would overflow: 50*DURNSEC = 50*((dur_t)1 32) / 10 So would 5 billion times DURSEC, but 5 billion seconds is more unreasobable than 5 billion nanoseconds and the format just can't represent that. (btw the typedef is swapped, should be typedef int64_t dur_t) Didn't notice this. Bruce ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
On Mon, Dec 17, 2012 at 11:03:53PM +0200, Alexander Motin wrote: Hi. I would instead do the following: I also don't very like the wide API and want to hear fresh ideas, but approaches to time measurement there are too different to do what you are proposing. Main problem is that while ticks value is relative, bintime is absolute. It is not easy to make conversion between them fast and precise. I've managed to do it, but the only function that does it now is _callout_reset_on(). All other functions are just passing values down. I am not sure I want to duplicate that code in each place, though doing it at least for for callout may be a good idea. I am afraid the above is not convincing. Most/all of the APIs i mentioned still have the conversion from ticks to bintime, and the code in your patch is just building multiple parallel paths (one for each of the three versions of the same function) to some final piece of code where the conversion takes place. The problem is that all of this goes through a set of obfuscating macros and the end result is horrible. To be clear, i believe the work you have been doing on cleaning up callout is great, i am just saying that this is the time to look at the code from a few steps away and clean up all those design decisions that perhaps were made in a haste to make things work. I will give you another example to show how convoluted is the code now: cv_timedwait() and cv_timedwait_sig() now have three versions each (plain, bt, flags). These six are remapped through macros to two functions, _cv_timedwait() and _cv_timedwait_sig(), with a possible bug (cv_timedwait_bt() maps to _cv_timedwait_sig() ) These two _cv_timedwait*() take both ticks and bintimes, and contain this sequence: + if (bt == NULL) + sleepq_set_timeout_flags(cvp, timo, flags); + else + sleepq_set_timeout_bt(cvp, bt, precision); Guess what, both sleepq_* are macros that remap to the same _sleepq_set_timeout(...) . So the above if (bt == NULL) is useless. But then if you dig into _sleepq_set_timeout() you'll see + if (bt == NULL) + callout_reset_flags_on(td-td_slpcallout, timo, + sleepq_timeout, td, PCPU_GET(cpuid), flags | C_DIRECT_EXEC); + else + callout_reset_bt_on(td-td_slpcallout, bt, precision, + sleepq_timeout, td, PCPU_GET(cpuid), flags | C_DIRECT_EXEC); and again both callout_reset*() are again remapped through macros to _callout_reset_on(), so another useless if (bt == NULL) And in the end you have the conversion from ticks to bintime. So basically the code path for cv_timedwait() has those two useless switches and one useless extra argument, and the conversion from ticks to bintime is down deep down in _callout_reset_on() where it can only be resolved at runtime, whereas by doing the conversion at the beginning the decision could have been made at compile time. So I believe my proposal would give large simplifications in the code and lead to a much cleaner implementation of what you have designed: 1. acknowledge the fact that the only representation of time that callouts use internally is a bintime+precision, define one single function (instead of two or three or six) that implements the blessed API, and implement the others with macros or inline functions doing the appropriate conversions; 2. specifically, the *_flags() variant has no reason to exist. It can be implemented through the *_bt() variant, and being a new function the only places where you introduce it require manual modifications so you can directly invoke the new function. Again, please take this as constructive criticism, as i really like the work you have been doing and appreciate the time and effort you are putting on it cheers luigi Creating sets of three functions I had three different goals: - callout_reset() -- it is legacy variant required to keep API compatibility; - callout_reset_flags() -- it is for cases where custom precision specification needs to be added to the existing code, or where direct callout execution is needed. Conversion to bintime would additionally complicate consumer code, that I would try to avoid. - callout_reset_bt() -- API for the new code, which needs high precision and doesn't mind to operate bintime. Now there is only three such places in kernel now, and I don't think there will be much more. Respectively, these three options are replicated to other APIs where time intervals are used. PS: Please keep me in CC. -- Alexander Motin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
On Tue, Dec 18, 2012 at 06:36:43PM +0100, Luigi Rizzo wrote: On Mon, Dec 17, 2012 at 11:03:53PM +0200, Alexander Motin wrote: ... So I believe my proposal would give large simplifications in the code and lead to a much cleaner implementation of what you have designed: 1. acknowledge the fact that the only representation of time that callouts use internally is a bintime+precision, define one single function (instead of two or three or six) that implements the blessed API, and implement the others with macros or inline functions doing the appropriate conversions; 2. specifically, the *_flags() variant has no reason to exist. It can be implemented through the *_bt() variant, and being a new function the only places where you introduce it require manual modifications so you can directly invoke the new function. to clarify: i am not sure if now the *_bt() variant takes flags too, but my point is that the main API function should take all supported arguments (including flags) and others should simply be regarded as simplified versions. More or less what we have for sockets, with send() and sendmsg() and friend cheers luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
On 18.12.2012 19:36, Luigi Rizzo wrote: On Mon, Dec 17, 2012 at 11:03:53PM +0200, Alexander Motin wrote: I would instead do the following: I also don't very like the wide API and want to hear fresh ideas, but approaches to time measurement there are too different to do what you are proposing. Main problem is that while ticks value is relative, bintime is absolute. It is not easy to make conversion between them fast and precise. I've managed to do it, but the only function that does it now is _callout_reset_on(). All other functions are just passing values down. I am not sure I want to duplicate that code in each place, though doing it at least for for callout may be a good idea. I am afraid the above is not convincing. Most/all of the APIs i mentioned still have the conversion from ticks to bintime, and the code in your patch is just building multiple parallel paths (one for each of the three versions of the same function) to some final piece of code where the conversion takes place. The problem is that all of this goes through a set of obfuscating macros and the end result is horrible. To be clear, i believe the work you have been doing on cleaning up callout is great, i am just saying that this is the time to look at the code from a few steps away and clean up all those design decisions that perhaps were made in a haste to make things work. I will give you another example to show how convoluted is the code now: cv_timedwait() and cv_timedwait_sig() now have three versions each (plain, bt, flags). These six are remapped through macros to two functions, _cv_timedwait() and _cv_timedwait_sig(), with a possible bug (cv_timedwait_bt() maps to _cv_timedwait_sig() ) These two _cv_timedwait*() take both ticks and bintimes, and contain this sequence: + if (bt == NULL) + sleepq_set_timeout_flags(cvp, timo, flags); + else + sleepq_set_timeout_bt(cvp, bt, precision); Guess what, both sleepq_* are macros that remap to the same _sleepq_set_timeout(...) . So the above if (bt == NULL) is useless. But then if you dig into _sleepq_set_timeout() you'll see + if (bt == NULL) + callout_reset_flags_on(td-td_slpcallout, timo, + sleepq_timeout, td, PCPU_GET(cpuid), flags | C_DIRECT_EXEC); + else + callout_reset_bt_on(td-td_slpcallout, bt, precision, + sleepq_timeout, td, PCPU_GET(cpuid), flags | C_DIRECT_EXEC); and again both callout_reset*() are again remapped through macros to _callout_reset_on(), so another useless if (bt == NULL) And in the end you have the conversion from ticks to bintime. So basically the code path for cv_timedwait() has those two useless switches and one useless extra argument, and the conversion from ticks to bintime is down deep down in _callout_reset_on() where it can only be resolved at runtime, whereas by doing the conversion at the beginning the decision could have been made at compile time. So I believe my proposal would give large simplifications in the code and lead to a much cleaner implementation of what you have designed: 1. acknowledge the fact that the only representation of time that callouts use internally is a bintime+precision, define one single function (instead of two or three or six) that implements the blessed API, and implement the others with macros or inline functions doing the appropriate conversions; 2. specifically, the *_flags() variant has no reason to exist. It can be implemented through the *_bt() variant, and being a new function the only places where you introduce it require manual modifications so you can directly invoke the new function. Again, please take this as constructive criticism, as i really like the work you have been doing and appreciate the time and effort you are putting on it Your words about useless cascaded ifs touched me. Actually I've looked on _callout_reset_bt_on() yesterday, thinking about moving tick to bt conversion to separate function or wrapper. The only thing we would save in such case is single integer argument (ticks), as all others (bt, prec, flags) are used in the new world order. From the other side, to make the conversion process really effective and correct, I've used quite specific way of obtaining time, that may change in the future. I would not really like it to be inlined in every consumer function and become an ABI. So I see two possible ways: make that conversion a separate non-inline function (that will require two temporary variables to return results and will consume some time on call/return), or make callout_reset_bt_on() to have extra ticks argument, allowing to use it in one or another way without external ifs and macros. In last case all _bt functions in other APIs will also obtain ticks, bt, pr and flags arguments. Actually flags there could be used to specify time scale (monotonic or wall) and time base (relative or absolute), if we
Re: API explosion (Re: [RFC/RFT] calloutng)
On 18.12.2012 20:03, Alexander Motin wrote: On 18.12.2012 19:36, Luigi Rizzo wrote: On Mon, Dec 17, 2012 at 11:03:53PM +0200, Alexander Motin wrote: I would instead do the following: I also don't very like the wide API and want to hear fresh ideas, but approaches to time measurement there are too different to do what you are proposing. Main problem is that while ticks value is relative, bintime is absolute. It is not easy to make conversion between them fast and precise. I've managed to do it, but the only function that does it now is _callout_reset_on(). All other functions are just passing values down. I am not sure I want to duplicate that code in each place, though doing it at least for for callout may be a good idea. I am afraid the above is not convincing. Most/all of the APIs i mentioned still have the conversion from ticks to bintime, and the code in your patch is just building multiple parallel paths (one for each of the three versions of the same function) to some final piece of code where the conversion takes place. The problem is that all of this goes through a set of obfuscating macros and the end result is horrible. To be clear, i believe the work you have been doing on cleaning up callout is great, i am just saying that this is the time to look at the code from a few steps away and clean up all those design decisions that perhaps were made in a haste to make things work. I will give you another example to show how convoluted is the code now: cv_timedwait() and cv_timedwait_sig() now have three versions each (plain, bt, flags). These six are remapped through macros to two functions, _cv_timedwait() and _cv_timedwait_sig(), with a possible bug (cv_timedwait_bt() maps to _cv_timedwait_sig() ) These two _cv_timedwait*() take both ticks and bintimes, and contain this sequence: +if (bt == NULL) +sleepq_set_timeout_flags(cvp, timo, flags); +else +sleepq_set_timeout_bt(cvp, bt, precision); Guess what, both sleepq_* are macros that remap to the same _sleepq_set_timeout(...) . So the above if (bt == NULL) is useless. But then if you dig into _sleepq_set_timeout() you'll see +if (bt == NULL) +callout_reset_flags_on(td-td_slpcallout, timo, +sleepq_timeout, td, PCPU_GET(cpuid), flags | C_DIRECT_EXEC); +else +callout_reset_bt_on(td-td_slpcallout, bt, precision, +sleepq_timeout, td, PCPU_GET(cpuid), flags | C_DIRECT_EXEC); and again both callout_reset*() are again remapped through macros to _callout_reset_on(), so another useless if (bt == NULL) And in the end you have the conversion from ticks to bintime. So basically the code path for cv_timedwait() has those two useless switches and one useless extra argument, and the conversion from ticks to bintime is down deep down in _callout_reset_on() where it can only be resolved at runtime, whereas by doing the conversion at the beginning the decision could have been made at compile time. So I believe my proposal would give large simplifications in the code and lead to a much cleaner implementation of what you have designed: 1. acknowledge the fact that the only representation of time that callouts use internally is a bintime+precision, define one single function (instead of two or three or six) that implements the blessed API, and implement the others with macros or inline functions doing the appropriate conversions; 2. specifically, the *_flags() variant has no reason to exist. It can be implemented through the *_bt() variant, and being a new function the only places where you introduce it require manual modifications so you can directly invoke the new function. Again, please take this as constructive criticism, as i really like the work you have been doing and appreciate the time and effort you are putting on it Your words about useless cascaded ifs touched me. Actually I've looked on _callout_reset_bt_on() yesterday, thinking about moving tick to bt conversion to separate function or wrapper. The only thing we would save in such case is single integer argument (ticks), as all others (bt, prec, flags) are used in the new world order. From the other side, to make the conversion process really effective and correct, I've used quite specific way of obtaining time, that may change in the future. I would not really like it to be inlined in every consumer function and become an ABI. So I see two possible ways: make that conversion a separate non-inline function (that will require two temporary variables to return results and will consume some time on call/return), or make callout_reset_bt_on() to have extra ticks argument, allowing to use it in one or another way without external ifs and macros. In last case all _bt functions in other APIs will also obtain ticks, bt, pr and flags arguments. Actually flags there could be used to specify time scale (monotonic or wall) and time base (relative or absolute), if we decide to implement all
Re: API explosion (Re: [RFC/RFT] calloutng)
[top posting for readability; in summary we were discussing the new callout API trying to avoid an explosion of methods and arguments while at the same time supporting the old API and the new one] (I am also Cc-ing phk as he might have better insight on the topic). I think the patch you propose is a step in the right direction, but i still remain concerned by having to pass two bintimes (by reference, but they should really go by value) and one 'ticks' value to all these functions. I am also dubious that we need a full 128 bits to specify the 'precision': there would be absolutely no loss of functionality if we decided to specify the precision in powers of 2, so a precision 'k' (signed) means 2^k seconds. This way 8 bits are enough to represent any precision we want. The key difference between 'ticks' and bintimes (and the main difficulty in the conversion) is that ticks are relative and bintimes are interpreted as absolute. This could be easily solved by using a flag to specify if the 'bt' argument is absolute or relative, and passing the argument by value. So now the flags could contain C_DIRECT_EXEC, C_BT_IS_RELATIVE, the precision, and another 64 or 128 bit field contains the bintime. How does this look ? cheers luigi On 18.12.2012 20:03, Alexander Motin wrote: On 18.12.2012 19:36, Luigi Rizzo wrote: On Mon, Dec 17, 2012 at 11:03:53PM +0200, Alexander Motin wrote: I would instead do the following: I also don't very like the wide API and want to hear fresh ideas, but approaches to time measurement there are too different to do what you are proposing. Main problem is that while ticks value is relative, bintime is absolute. It is not easy to make conversion between them fast and precise. I've managed to do it, but the only function that does it now is _callout_reset_on(). All other functions are just passing values down. I am not sure I want to duplicate that code in each place, though doing it at least for for callout may be a good idea. I am afraid the above is not convincing. Most/all of the APIs i mentioned still have the conversion from ticks to bintime, and the code in your patch is just building multiple parallel paths (one for each of the three versions of the same function) to some final piece of code where the conversion takes place. The problem is that all of this goes through a set of obfuscating macros and the end result is horrible. To be clear, i believe the work you have been doing on cleaning up callout is great, i am just saying that this is the time to look at the code from a few steps away and clean up all those design decisions that perhaps were made in a haste to make things work. I will give you another example to show how convoluted is the code now: cv_timedwait() and cv_timedwait_sig() now have three versions each (plain, bt, flags). These six are remapped through macros to two functions, _cv_timedwait() and _cv_timedwait_sig(), with a possible bug (cv_timedwait_bt() maps to _cv_timedwait_sig() ) These two _cv_timedwait*() take both ticks and bintimes, and contain this sequence: +if (bt == NULL) +sleepq_set_timeout_flags(cvp, timo, flags); +else +sleepq_set_timeout_bt(cvp, bt, precision); Guess what, both sleepq_* are macros that remap to the same _sleepq_set_timeout(...) . So the above if (bt == NULL) is useless. But then if you dig into _sleepq_set_timeout() you'll see +if (bt == NULL) +callout_reset_flags_on(td-td_slpcallout, timo, +sleepq_timeout, td, PCPU_GET(cpuid), flags | C_DIRECT_EXEC); +else +callout_reset_bt_on(td-td_slpcallout, bt, precision, +sleepq_timeout, td, PCPU_GET(cpuid), flags | C_DIRECT_EXEC); and again both callout_reset*() are again remapped through macros to _callout_reset_on(), so another useless if (bt == NULL) And in the end you have the conversion from ticks to bintime. So basically the code path for cv_timedwait() has those two useless switches and one useless extra argument, and the conversion from ticks to bintime is down deep down in _callout_reset_on() where it can only be resolved at runtime, whereas by doing the conversion at the beginning the decision could have been made at compile time. So I believe my proposal would give large simplifications in the code and lead to a much cleaner implementation of what you have designed: 1. acknowledge the fact that the only representation of time that callouts use internally is a bintime+precision, define one single function (instead of two or three or six) that implements the blessed API, and implement the others with macros or inline functions doing the appropriate conversions; 2. specifically, the *_flags() variant has no reason to exist. It can be implemented through the *_bt() variant, and being a new function the only places where you introduce it require manual modifications so you can
Re: API explosion (Re: [RFC/RFT] calloutng)
On Tue, 2012-12-18 at 23:58 +0100, Luigi Rizzo wrote: [top posting for readability; in summary we were discussing the new callout API trying to avoid an explosion of methods and arguments while at the same time supporting the old API and the new one] (I am also Cc-ing phk as he might have better insight on the topic). I think the patch you propose is a step in the right direction, but i still remain concerned by having to pass two bintimes (by reference, but they should really go by value) and one 'ticks' value to all these functions. I am also dubious that we need a full 128 bits to specify the 'precision': there would be absolutely no loss of functionality if we decided to specify the precision in powers of 2, so a precision 'k' (signed) means 2^k seconds. This way 8 bits are enough to represent any precision we want. The key difference between 'ticks' and bintimes (and the main difficulty in the conversion) is that ticks are relative and bintimes are interpreted as absolute. This could be easily solved by using a flag to specify if the 'bt' argument is absolute or relative, and passing the argument by value. So now the flags could contain C_DIRECT_EXEC, C_BT_IS_RELATIVE, the precision, and another 64 or 128 bit field contains the bintime. How does this look ? cheers luigi I tend to agree that the bintime should be passed by value instead of reference. That would allow an inline tickstobintime() that converts relative ticks to an absolute bintime returned by value and passed right along in one tidy line/clump of code without any temporary variables cluttering things up. While the 1980s C programmer in me still wants to avoid returning complex objects by value, the reality is that modern compilers tend to generate really nice code for such constructs, usually without any copying of the return value at all. I'm not so sure about the 2^k precision. You speak of seconds, but I would be worrying about sub-second precision in my work. It would typical to want a 500uS timeout but be willing to late by up to 250uS if that helped scheduling and performance. Also, my idea of precision would virtually always be I'm willing to be late by this much, but never early by any amount. To reinforce the point of that last paragraph... the way I'm looking at these changes has nothing to do with power saving (I've never owned a battery-operated computer, probably never will) and everything to do with performance and being able to sleep accurately for less than a tick. -- Ian On 18.12.2012 20:03, Alexander Motin wrote: On 18.12.2012 19:36, Luigi Rizzo wrote: On Mon, Dec 17, 2012 at 11:03:53PM +0200, Alexander Motin wrote: I would instead do the following: I also don't very like the wide API and want to hear fresh ideas, but approaches to time measurement there are too different to do what you are proposing. Main problem is that while ticks value is relative, bintime is absolute. It is not easy to make conversion between them fast and precise. I've managed to do it, but the only function that does it now is _callout_reset_on(). All other functions are just passing values down. I am not sure I want to duplicate that code in each place, though doing it at least for for callout may be a good idea. I am afraid the above is not convincing. Most/all of the APIs i mentioned still have the conversion from ticks to bintime, and the code in your patch is just building multiple parallel paths (one for each of the three versions of the same function) to some final piece of code where the conversion takes place. The problem is that all of this goes through a set of obfuscating macros and the end result is horrible. To be clear, i believe the work you have been doing on cleaning up callout is great, i am just saying that this is the time to look at the code from a few steps away and clean up all those design decisions that perhaps were made in a haste to make things work. I will give you another example to show how convoluted is the code now: cv_timedwait() and cv_timedwait_sig() now have three versions each (plain, bt, flags). These six are remapped through macros to two functions, _cv_timedwait() and _cv_timedwait_sig(), with a possible bug (cv_timedwait_bt() maps to _cv_timedwait_sig() ) These two _cv_timedwait*() take both ticks and bintimes, and contain this sequence: +if (bt == NULL) +sleepq_set_timeout_flags(cvp, timo, flags); +else +sleepq_set_timeout_bt(cvp, bt, precision); Guess what, both sleepq_* are macros that remap to the same _sleepq_set_timeout(...) . So the above if (bt == NULL) is useless. But then if you dig into _sleepq_set_timeout() you'll see +if (bt == NULL) +callout_reset_flags_on(td-td_slpcallout, timo, +sleepq_timeout, td, PCPU_GET(cpuid), flags | C_DIRECT_EXEC); +else +
Re: API explosion (Re: [RFC/RFT] calloutng)
On Tue, Dec 18, 2012 at 04:27:45PM -0700, Ian Lepore wrote: On Tue, 2012-12-18 at 23:58 +0100, Luigi Rizzo wrote: [top posting for readability; in summary we were discussing the new callout API trying to avoid an explosion of methods and arguments while at the same time supporting the old API and the new one] (I am also Cc-ing phk as he might have better insight on the topic). I think the patch you propose is a step in the right direction, but i still remain concerned by having to pass two bintimes (by reference, but they should really go by value) and one 'ticks' value to all these functions. I am also dubious that we need a full 128 bits to specify the 'precision': there would be absolutely no loss of functionality if we decided to specify the precision in powers of 2, so a precision 'k' (signed) means 2^k seconds. This way 8 bits are enough to represent any precision we want. ... I'm not so sure about the 2^k precision. You speak of seconds, but I would be worrying about sub-second precision in my work. It would typical to want a 500uS timeout but be willing to late by up to 250uS if i said k is signed so negative values represent fractions of a second. 2^-128 is pretty short :) cheers luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
On Wed, 2012-12-19 at 00:29 +0100, Luigi Rizzo wrote: On Tue, Dec 18, 2012 at 04:27:45PM -0700, Ian Lepore wrote: On Tue, 2012-12-18 at 23:58 +0100, Luigi Rizzo wrote: [top posting for readability; in summary we were discussing the new callout API trying to avoid an explosion of methods and arguments while at the same time supporting the old API and the new one] (I am also Cc-ing phk as he might have better insight on the topic). I think the patch you propose is a step in the right direction, but i still remain concerned by having to pass two bintimes (by reference, but they should really go by value) and one 'ticks' value to all these functions. I am also dubious that we need a full 128 bits to specify the 'precision': there would be absolutely no loss of functionality if we decided to specify the precision in powers of 2, so a precision 'k' (signed) means 2^k seconds. This way 8 bits are enough to represent any precision we want. ... I'm not so sure about the 2^k precision. You speak of seconds, but I would be worrying about sub-second precision in my work. It would typical to want a 500uS timeout but be willing to late by up to 250uS if i said k is signed so negative values represent fractions of a second. 2^-128 is pretty short :) cheers luigi Ahh, I missed that. Good enough then! Hmmm, if that ideas survives further review, then could precision be encoded in 8 bits of the flags, eliminating another parm? -- Ian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: API explosion (Re: [RFC/RFT] calloutng)
On Tue, Dec 18, 2012 at 04:37:10PM -0700, Ian Lepore wrote: On Wed, 2012-12-19 at 00:29 +0100, Luigi Rizzo wrote: On Tue, Dec 18, 2012 at 04:27:45PM -0700, Ian Lepore wrote: On Tue, 2012-12-18 at 23:58 +0100, Luigi Rizzo wrote: [top posting for readability; in summary we were discussing the new callout API trying to avoid an explosion of methods and arguments while at the same time supporting the old API and the new one] (I am also Cc-ing phk as he might have better insight on the topic). I think the patch you propose is a step in the right direction, but i still remain concerned by having to pass two bintimes (by reference, but they should really go by value) and one 'ticks' value to all these functions. I am also dubious that we need a full 128 bits to specify the 'precision': there would be absolutely no loss of functionality if we decided to specify the precision in powers of 2, so a precision 'k' (signed) means 2^k seconds. This way 8 bits are enough to represent any precision we want. ... I'm not so sure about the 2^k precision. You speak of seconds, but I would be worrying about sub-second precision in my work. It would typical to want a 500uS timeout but be willing to late by up to 250uS if i said k is signed so negative values represent fractions of a second. 2^-128 is pretty short :) cheers luigi Ahh, I missed that. Good enough then! Hmmm, if that ideas survives further review, then could precision be encoded in 8 bits of the flags, eliminating another parm? that was also what i wrote later in the message :) now we should figure out some use for the remaining 22 bits of the flags cheers luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
API explosion (Re: [RFC/RFT] calloutng)
[again, response to another issue i raised] On Fri, Dec 14, 2012 at 01:57:36PM +0100, Davide Italiano wrote: On Fri, Dec 14, 2012 at 7:41 AM, Luigi Rizzo ri...@iet.unipi.it wrote: ... Finally, a more substantial comment: - a lot of functions which formerly had only a timo argument now have timo, bt, precision, flags. Take seltdwait() as an example. seltdwait() is not part of the public KPI. It has been modified to avoid code duplication. Having seltdwait() and seltdwait_bt(), i.e. two separate functions, even though we could share most of the code is not a clever approach, IMHO. As I told before, seltdwait() is not exposed so we can modify its argument without breaking anything. It seems that you have been undecided between two approaches: for some of these functions you have preserved the original function that deals with ticks and introduced a new one that deals with the bintime, whereas in other cases you have modified the original function to add bt, precision, flags. I'm not. All the functions which are part of the public KPI (e.g. condvar(9), sleepq(9), *) are still available. *_flags variants have been introduced so that consumers can take advantage of the new 'precision tolerance mechanism' implemented. Also, *_bt variants have been introduced. I don't see any undecision between the two approaches. Please note that now the callout backend deals with bintime, so every time callout_reset_on() is called, the 'tick' argument passed is silently converted to bintime. I will try to give more specific example, using the latest patch from mav http://people.freebsd.org/~mav/calloutng_12_16.patch In the manpage, for instance, the existing functions now are extended with two more variants (sometimes; msleep_spin() for instance is missing msleep_spin_bt() but perhaps that is just an oversight). .Nm sleepq_set_timeout , +.Nm sleepq_set_timeout_flags , +.Nm sleepq_set_timeout_bt , .Nm msleep , +.Nm msleep_flags , +.Nm msleep_bt , .Nm msleep_spin , +.Nm msleep_spin_flags , .Nm pause , +.Nm pause_flags , +.Nm pause_bt , .Nm tsleep , +.Nm tsleep_flags , +.Nm tsleep_bt , .Nm cv_timedwait , +.Nm cv_timedwait_bt , +.Nm cv_timedwait_flags , .Nm cv_timedwait_sig , +.Nm cv_timedwait_sig_bt , +.Nm cv_timedwait_sig_flags , .Nm callout_reset , +.Nm callout_reset_flags , .Nm callout_reset_on , +.Nm callout_reset_flags_on , +.Nm callout_reset_bt_on , If you look at the backends, they take both a timo and a bintime. -int_cv_timedwait(struct cv *cvp, struct lock_object *lock, int timo); -int_cv_timedwait_sig(struct cv *cvp, struct lock_object *lock, int timo); +int_cv_timedwait(struct cv *cvp, struct lock_object *lock, + struct bintime *bt, struct bintime *precision, int timo, + int flags); +int_cv_timedwait_sig(struct cv *cvp, struct lock_object *lock, + struct bintime *bt, struct bintime *precision, int timo, + int flags); and then internally they call the 'timo' or the 'bt' version depending on the case + if (bt == NULL) + sleepq_set_timeout_flags(cvp, timo, flags); + else + sleepq_set_timeout_bt(cvp, bt, precision); So basically you are doing the following: + create two new variant for each existing function foo(, ... timo, ... ) old method foo_flags(, ... timo, ... ) new method foo_bt(... , bt, precision, ...)new method (the 'API explosion' i am mentioning in the Subject) + the variants are mapped to the same internal function _foo_internal(..., timo, bt, precision, flags, ...) + and then the internal function has a (runtime) conditional if (bt == NULL) { // convert timo to bt } else { // have a bt + precision } ... I would instead do the following: + create a new API function that takes only bintime+precision+flags, no ticks. I am not sure if there is even a need to have an internal name _cv_timedwait_sig( ) or you can just have cv_timedwait_sig_bt(...) + use a macro or an inline function to remap the old API to the (single) new one, making the argument conversion immediately. #define cv_timedwait_sig(...) cv_timedwait_sig_bt( ...) This has the advantage that conversions might be done at compile time perhaps with some advantage in terms of code and performance. + do not bother creating yet another cv_timedwait_sig_flags() function. Since it did not exist before, you have to do the conversion manually anyways, at which point you just change the argument to use bintime instead of ticks.
Re: API explosion (Re: [RFC/RFT] calloutng)
Hi. I would instead do the following: I also don't very like the wide API and want to hear fresh ideas, but approaches to time measurement there are too different to do what you are proposing. Main problem is that while ticks value is relative, bintime is absolute. It is not easy to make conversion between them fast and precise. I've managed to do it, but the only function that does it now is _callout_reset_on(). All other functions are just passing values down. I am not sure I want to duplicate that code in each place, though doing it at least for for callout may be a good idea. Creating sets of three functions I had three different goals: - callout_reset() -- it is legacy variant required to keep API compatibility; - callout_reset_flags() -- it is for cases where custom precision specification needs to be added to the existing code, or where direct callout execution is needed. Conversion to bintime would additionally complicate consumer code, that I would try to avoid. - callout_reset_bt() -- API for the new code, which needs high precision and doesn't mind to operate bintime. Now there is only three such places in kernel now, and I don't think there will be much more. Respectively, these three options are replicated to other APIs where time intervals are used. PS: Please keep me in CC. -- Alexander Motin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org