Re: [patch] change futex_wait() to hrtimers
> BTW. my futex man page says timeout's contents "describe the maximum duration > of the wait". Surely that should be *minimum*? Michael cc'ed. Er, the intent of the wording is to say "futex will wait until uaddr no longer contains val, or the timeout expires, whichever happens first". One option for selecting different clock resolutions is to use the clockid_t from the POSIX clock_gettime() family. That is, specify the clock that a wait uses, and then have a separate mechanism for turning a resolution requirement into a clockid_t. (And there can be default clocks for interfaces that don't specify one explicitly.) Although clockid_t is pretty generic, it's biased toward an enumerated list of clocks rather than a continuous resolution. Fortunately, that seems to match the implementation ideas. The question is how much the timeout gets rounded, and the choices are currently jiffies or microseconds. A related option may be whether rounding down is acceptable. For some applications (periodic polling for events), it's fine. For others, it's not. Thus, while it's okay to specify such clocks explicitly, it'd probably be a good idea to forbid selecting them as the default for interfaces that don't specify a clock explicitly. I had some code that suffered 1 ms buzz-loops on Solaris because poll(2) would round the timeout interval down, but the loop calling it would explicitly check whether the timeout had expired using gettimeofday() and would keep re-invoking poll(pollfds, npollfds, 1) until the timeout really did expire. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
BTW. my futex man page says timeout's contents describe the maximum duration of the wait. Surely that should be *minimum*? Michael cc'ed. Er, the intent of the wording is to say futex will wait until uaddr no longer contains val, or the timeout expires, whichever happens first. One option for selecting different clock resolutions is to use the clockid_t from the POSIX clock_gettime() family. That is, specify the clock that a wait uses, and then have a separate mechanism for turning a resolution requirement into a clockid_t. (And there can be default clocks for interfaces that don't specify one explicitly.) Although clockid_t is pretty generic, it's biased toward an enumerated list of clocks rather than a continuous resolution. Fortunately, that seems to match the implementation ideas. The question is how much the timeout gets rounded, and the choices are currently jiffies or microseconds. A related option may be whether rounding down is acceptable. For some applications (periodic polling for events), it's fine. For others, it's not. Thus, while it's okay to specify such clocks explicitly, it'd probably be a good idea to forbid selecting them as the default for interfaces that don't specify a clock explicitly. I had some code that suffered 1 ms buzz-loops on Solaris because poll(2) would round the timeout interval down, but the loop calling it would explicitly check whether the timeout had expired using gettimeofday() and would keep re-invoking poll(pollfds, npollfds, 1) until the timeout really did expire. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
* Theodore Tso <[EMAIL PROTECTED]> wrote: > What we probably need in the long-term, and not just for high > precision wakeups, is we need a way for waiters (either in the kernel > or in userspace) to specify a desired precision in their timers. Is > it, "wake me up in a second, exactly", or "wake me up in a second, > plus or minus 10ms"? (or 50ms? or 100ms?). such a facility exists already, see round_jiffies() and round_jiffies_relative(). There's some short blurb about it at: http://kernelnewbies.org/LinuxChanges#head-513ceda14f5d8cf5b8a7c81d7e3821543141ecb0 > This becomes especially important if we want the tickless code to > really shine as far as power management is concerned. [...] yes. That's why we also implemented /proc/timer_stat, and this was measured and a few higher-frequency fuzzy waiters were converted to use round_jiffies(). Some other waiters were fixed in user-space. It's all dependent on actual measurements and circumstances. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 10:12:14AM -0400, Theodore Tso wrote: > On Mon, Mar 12, 2007 at 11:58:26AM +0100, Andi Kleen wrote: > > On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote: > > > On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote: > > > > Ingo Molnar <[EMAIL PROTECTED]> writes: > > > > > > > > > > the only correct approach is the use of hrtimers, and a patch exists > > > > > for > > > > > that - see below. This has been included in -rt for quite some time. > > > > > > > > But isn't that bad for power management? You'll likely get more > > > > idle wakeups, won't you? > > > > > > Why so ? It comes more precise, but only once. > > > > When it's clustered around the jiffies interval then wakeups from > > multiple processes will be somewhat batched. With a precise wakeup you'll > > get wakeups all over the jiffies period, won't you? > > What we probably need in the long-term, and not just for high > precision wakeups, is we need a way for waiters (either in the kernel > or in userspace) to specify a desired precision in their timers. Is > it, "wake me up in a second, exactly", or "wake me up in a second, > plus or minus 10ms"? (or 50ms? or 100ms?). Would this work, or will it just create more confusion for the API user? I mean, all sleeps can only guarantee "no less than". Would it be enough for a binary (exact as possible / relaxed if needed) flag? Or perhaps ternary (exact/relaxed/batched), where relaxed could add an extra jiffy or so, and batched is really relaxed that may delay up to double the value of the timeout. > This becomes especially important if we want the tickless code to > really shine as far as power management is concerned. Unfortunately, > the POSIX timer abstraction doesn't give this kind of flexibility > easily, so it's going to be a while before we see significant > userspace adoption of such a kernel feature, but I think it's > something that would be still worthwhile to add. But given that we know most userspace API timeouts are broadly just an "equal to or greater", then we could add another timeout flag to specify it is a userspace timeout, and make that controllable by sysctl. Sure it isn't ideal, but for those who really want power / hypervisor savings, it could be useful. BTW. my futex man page says timeout's contents "describe the maximum duration of the wait". Surely that should be *minimum*? Michael cc'ed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
> This becomes especially important if we want the tickless code to > really shine as far as power management is concerned. Unfortunately, > the POSIX timer abstraction doesn't give this kind of flexibility > easily, so it's going to be a while before we see significant > userspace adoption of such a kernel feature, but I think it's > something that would be still worthwhile to add. I suspect it would be overkill to specify that on every sleep operation. 99% of applications won't care and for the 1% leftover a global per process setting should be fine. I think a single prctl() and a global sysctl as default would be enough. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 11:58:26AM +0100, Andi Kleen wrote: > On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote: > > On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote: > > > Ingo Molnar <[EMAIL PROTECTED]> writes: > > > > > > > > the only correct approach is the use of hrtimers, and a patch exists > > > > for > > > > that - see below. This has been included in -rt for quite some time. > > > > > > But isn't that bad for power management? You'll likely get more > > > idle wakeups, won't you? > > > > Why so ? It comes more precise, but only once. > > When it's clustered around the jiffies interval then wakeups from > multiple processes will be somewhat batched. With a precise wakeup you'll > get wakeups all over the jiffies period, won't you? What we probably need in the long-term, and not just for high precision wakeups, is we need a way for waiters (either in the kernel or in userspace) to specify a desired precision in their timers. Is it, "wake me up in a second, exactly", or "wake me up in a second, plus or minus 10ms"? (or 50ms? or 100ms?). This becomes especially important if we want the tickless code to really shine as far as power management is concerned. Unfortunately, the POSIX timer abstraction doesn't give this kind of flexibility easily, so it's going to be a while before we see significant userspace adoption of such a kernel feature, but I think it's something that would be still worthwhile to add. Regards, - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 01:21:03PM +0100, Ingo Molnar wrote: > > * Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > > > the issue is this: your fix reduces the effects of the bug but > > > > > it is still fundamentally incomplete because of the use of > > > > > timer_list. So > > > > > > > > But using schedule_timeout is not a bug. Userspace timeouts are > > > > always defined to be "at least". > > > > > > but what you are adding isnt a plain schedule_timeout(), it is a > > > restart block handling loop. And for those restart blocks that > > > relate to timeouts, we only use hrtimers. I am not making this up to > > > annoy you: take a look at all the current restart block handlers - > > > they are hrtimer based, for exactly this reason. > > > > So why do you say it is fundamentally incomplete? > > because i misread your last patch :-) I thought it still has a window > for inaccuracy, but you are right: it should be at most 1 jiffy > inaccurate, no matter how many times we restart. OK, no problem. > still ... the hrtimers patch has been submitted to lkml before yours, > and has been tested extensively, so why go the extra side-jump > prolonging the jiffies sleep method? The LTP failure has been there > since the inception of the futex code i suspect. Going this way also > enables the addressing of a more pressing need: the elimination of > glibc's forced use of relative futex timeouts. I guess my arguments are that my patch fixes a bug, which gives it a higher priority (being a userspace API bug, perhaps even 2.6.21); and that it will want to be backported while the hrtimer patch will not, so including the hrtimer patch first means 2 different patches to fix the same bug. I'm not trying to make life harder for the hrtimer patch. I will even volunteer to forward port it on top of the restart fix, if that is an issue. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
* Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > the issue is this: your fix reduces the effects of the bug but > > > > it is still fundamentally incomplete because of the use of > > > > timer_list. So > > > > > > But using schedule_timeout is not a bug. Userspace timeouts are > > > always defined to be "at least". > > > > but what you are adding isnt a plain schedule_timeout(), it is a > > restart block handling loop. And for those restart blocks that > > relate to timeouts, we only use hrtimers. I am not making this up to > > annoy you: take a look at all the current restart block handlers - > > they are hrtimer based, for exactly this reason. > > So why do you say it is fundamentally incomplete? because i misread your last patch :-) I thought it still has a window for inaccuracy, but you are right: it should be at most 1 jiffy inaccurate, no matter how many times we restart. still ... the hrtimers patch has been submitted to lkml before yours, and has been tested extensively, so why go the extra side-jump prolonging the jiffies sleep method? The LTP failure has been there since the inception of the futex code i suspect. Going this way also enables the addressing of a more pressing need: the elimination of glibc's forced use of relative futex timeouts. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 12:38:29PM +0100, Ingo Molnar wrote: > > * Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > the issue is this: your fix reduces the effects of the bug but it is > > > still fundamentally incomplete because of the use of timer_list. So > > > > But using schedule_timeout is not a bug. Userspace timeouts are always > > defined to be "at least". > > but what you are adding isnt a plain schedule_timeout(), it is a restart > block handling loop. And for those restart blocks that relate to > timeouts, we only use hrtimers. I am not making this up to annoy you: > take a look at all the current restart block handlers - they are hrtimer > based, for exactly this reason. So why do you say it is fundamentally incomplete? > > > instead of trying to fix the bug the wrong way, please try to fix it > > > the right way, ontop of an already existing and tested patch, ok? > > > That also enables the other neat stuff Thomas talked about. > > > > Well that's nice, but I have a bugfix here which probably needs to get > > backported to stable kernels and distro kernels. > > yes but your patch already exists for them which they can pick up. > > really, this is a common Linux principle: fix it completely and fix it > the right way. You are applying it yourself on a daily basis when having > the maintainer hat on =B-) I still didn't get anything wrong pointed out with the patch, though. I'm not arguing against using hrtimers here to fix it the "right way". > > It should be just as easy to rebase the hrtimer patch on top of my > > fix. Considering that you've had it for a year, I don't think it needs > > to be added right before my fix. > > your latest patch looks quite kludgy, exactly due to the issues that > were mentioned. I don't see what is kludgy, unless you consider converting to and from absolute timeouts. But the userspace API is relative time based, so hrtimers doesn't change that. > > > hm. I'm wondering how this wasnt noticed sooner - this futex_wait > > > behavior has been there for like forever. > > > > People ignore LTP test failures, and programs probably try to avoid > > exercising the nuances of the unix signal API, I guess. > > then there's no rush and lets do this the right way, ok? There is no rush to use hrtimers. I would have thought it fairly important to actually reach correctness, though. We're not talking about completely changing the design of something such that it will take a lot of work to "fix it properly". If that were the issue, then I would consider the hrtimer conversion as part of the fix. And if you talk about doing it the right way, then I don't think it is strictly the right way to reimplement the function, including known bugs, to be slightly more efficient, and *then* fixing those bugs. I'd actually consider it better to fix the bugs first, not only because of the backport issue, but because it generally makes it easier to track the injection and removal points of bugs in the history. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
* Nick Piggin <[EMAIL PROTECTED]> wrote: > > the issue is this: your fix reduces the effects of the bug but it is > > still fundamentally incomplete because of the use of timer_list. So > > But using schedule_timeout is not a bug. Userspace timeouts are always > defined to be "at least". but what you are adding isnt a plain schedule_timeout(), it is a restart block handling loop. And for those restart blocks that relate to timeouts, we only use hrtimers. I am not making this up to annoy you: take a look at all the current restart block handlers - they are hrtimer based, for exactly this reason. > > instead of trying to fix the bug the wrong way, please try to fix it > > the right way, ontop of an already existing and tested patch, ok? > > That also enables the other neat stuff Thomas talked about. > > Well that's nice, but I have a bugfix here which probably needs to get > backported to stable kernels and distro kernels. yes but your patch already exists for them which they can pick up. really, this is a common Linux principle: fix it completely and fix it the right way. You are applying it yourself on a daily basis when having the maintainer hat on =B-) > It should be just as easy to rebase the hrtimer patch on top of my > fix. Considering that you've had it for a year, I don't think it needs > to be added right before my fix. your latest patch looks quite kludgy, exactly due to the issues that were mentioned. > > hm. I'm wondering how this wasnt noticed sooner - this futex_wait > > behavior has been there for like forever. > > People ignore LTP test failures, and programs probably try to avoid > exercising the nuances of the unix signal API, I guess. then there's no rush and lets do this the right way, ok? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
* Andi Kleen <[EMAIL PROTECTED]> wrote: > > if HIGH_RES_TIMERS is disabled then that is what happens. But > > frankly, > > disabled? I would expect it (= more wakeups) when hrtimers are > enabled. i mean the groupping of timer expiries happens automatically when high-res is disabled. When high-res is asked for then, duh, it's enabled and you get precise timeouts ;-) > > most futex waits are without timeouts - if an application cares > > about micro-effects like that then you are much better off not using > > a per-futex timeout anywa > > That sounds like you're arguing for not using hrtimers here because > the applications shouldn't depend on precise timeouts here anyways?!? I was talking about the "micro-effect" of grouping timer expiries. > Anyways when you convert more kernel timeouts to hrtimers you should > probably add some kind of batching facility that can be globally > configured with a sysctl or similar. Then at least laptops (and > possibly servers) can opt for more power saving again. For the futexes > alone it probably won't matter too much agreed, but I see a trend to > more hr. yeah, we had that in earlier versions, it's trivial - nobody used it. So i'll wait for the actual measurements and a patch (or i can do the patch too, if someone comes up with the measurements). I've added /proc/timer_stats and /proc/timer_info for exactly such reasons. ( note that we've added the facility for even more imprecise sleeps to the timer_list APIs - but for hrtimers it's a lot less clear-cut case. ) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 12:19:58PM +0100, Ingo Molnar wrote: > > * Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > even if this means more work for you (i'm sorry about that!) i'm > > > quite sure we should take Sebastien's hrtimers based implementation > > > of futex_wait(), and use the nanosleep method to restart it. There's > > > no point in further tweaking the imprecise approach: whenever some > > > timeout needs to be restarted, it's a candidate for hrtimers. > > > > Absolute timeout is needed, sure. But once that is done, hrtimers does > > not fix a bug, does it? > > the issue is this: your fix reduces the effects of the bug but it is > still fundamentally incomplete because of the use of timer_list. So But using schedule_timeout is not a bug. Userspace timeouts are always defined to be "at least". > instead of trying to fix the bug the wrong way, please try to fix it the > right way, ontop of an already existing and tested patch, ok? That also > enables the other neat stuff Thomas talked about. Well that's nice, but I have a bugfix here which probably needs to get backported to stable kernels and distro kernels. It should be just as easy to rebase the hrtimer patch on top of my fix. Considering that you've had it for a year, I don't think it needs to be added right before my fix. > > > until then, glibc already handles timeouts and restarts it manually. > > > > It isn't timeout handling that is buggy, but EINTR behaviour. And > > glibc does not handle that here. > > hm. I'm wondering how this wasnt noticed sooner - this futex_wait > behavior has been there for like forever. People ignore LTP test failures, and programs probably try to avoid exercising the nuances of the unix signal API, I guess. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
> if HIGH_RES_TIMERS is disabled then that is what happens. But frankly, disabled? I would expect it (= more wakeups) when hrtimers are enabled. > most futex waits are without timeouts - if an application cares about > micro-effects like that then you are much better off not using a > per-futex timeout anywa That sounds like you're arguing for not using hrtimers here because the applications shouldn't depend on precise timeouts here anyways?!? Anyways when you convert more kernel timeouts to hrtimers you should probably add some kind of batching facility that can be globally configured with a sysctl or similar. Then at least laptops (and possibly servers) can opt for more power saving again. For the futexes alone it probably won't matter too much agreed, but I see a trend to more hr. I also liked the idea (stolen from another popular OS) that a application can tell the OS how precise it wants its wakeups to be. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
* Nick Piggin <[EMAIL PROTECTED]> wrote: > > even if this means more work for you (i'm sorry about that!) i'm > > quite sure we should take Sebastien's hrtimers based implementation > > of futex_wait(), and use the nanosleep method to restart it. There's > > no point in further tweaking the imprecise approach: whenever some > > timeout needs to be restarted, it's a candidate for hrtimers. > > Absolute timeout is needed, sure. But once that is done, hrtimers does > not fix a bug, does it? the issue is this: your fix reduces the effects of the bug but it is still fundamentally incomplete because of the use of timer_list. So instead of trying to fix the bug the wrong way, please try to fix it the right way, ontop of an already existing and tested patch, ok? That also enables the other neat stuff Thomas talked about. > > until then, glibc already handles timeouts and restarts it manually. > > It isn't timeout handling that is buggy, but EINTR behaviour. And > glibc does not handle that here. hm. I'm wondering how this wasnt noticed sooner - this futex_wait behavior has been there for like forever. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, 2007-03-12 at 12:02 +0100, Ingo Molnar wrote: > > Well I did convert futex_wait to an absolute timeout based version in > > the subsequent incremental patch. I think that is OK? > > it still has the rounding artifacts: using timer_list there is no way to > do a precise long sleep based on many small sleeps. > > even if this means more work for you (i'm sorry about that!) i'm quite > sure we should take Sebastien's hrtimers based implementation of > futex_wait(), and use the nanosleep method to restart it. There's no > point in further tweaking the imprecise approach: whenever some timeout > needs to be restarted, it's a candidate for hrtimers. > > until then, glibc already handles timeouts and restarts it manually. This also allows us to add a seperate absolute time bases futex op, which allows to remove the conversion of abstime to reltime in glibc. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 12:02:04PM +0100, Ingo Molnar wrote: > > * Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > i dont think we should try to do this. We should not and cannot do > > > anything about all of the artifacts that comes with the use of > > > relative timeouts and schedule_timeout(). > > > > > > basically, using jiffies here (which schedule_timeout() does) is > > > /fundamentally/ imprecise. If you get many interrupts, rounding > > > errors sum up - and there's nothing we can do about it! > > > > Well I did convert futex_wait to an absolute timeout based version in > > the subsequent incremental patch. I think that is OK? > > it still has the rounding artifacts: using timer_list there is no way to > do a precise long sleep based on many small sleeps. OK but that is nothing to do with my patch, but the original futex_wait implementation. > even if this means more work for you (i'm sorry about that!) i'm quite > sure we should take Sebastien's hrtimers based implementation of > futex_wait(), and use the nanosleep method to restart it. There's no > point in further tweaking the imprecise approach: whenever some timeout > needs to be restarted, it's a candidate for hrtimers. Absolute timeout is needed, sure. But once that is done, hrtimers does not fix a bug, does it? > > until then, glibc already handles timeouts and restarts it manually. It isn't timeout handling that is buggy, but EINTR behaviour. And glibc does not handle that here. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
* Andi Kleen <[EMAIL PROTECTED]> wrote: > On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote: > > On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote: > > > Ingo Molnar <[EMAIL PROTECTED]> writes: > > > > > > > > the only correct approach is the use of hrtimers, and a patch exists > > > > for > > > > that - see below. This has been included in -rt for quite some time. > > > > > > But isn't that bad for power management? You'll likely get more > > > idle wakeups, won't you? > > > > Why so ? It comes more precise, but only once. > > When it's clustered around the jiffies interval then wakeups from > multiple processes will be somewhat batched. With a precise wakeup > you'll get wakeups all over the jiffies period, won't you? if HIGH_RES_TIMERS is disabled then that is what happens. But frankly, most futex waits are without timeouts - if an application cares about micro-effects like that then you are much better off not using a per-futex timeout anyway. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
* Nick Piggin <[EMAIL PROTECTED]> wrote: > > i dont think we should try to do this. We should not and cannot do > > anything about all of the artifacts that comes with the use of > > relative timeouts and schedule_timeout(). > > > > basically, using jiffies here (which schedule_timeout() does) is > > /fundamentally/ imprecise. If you get many interrupts, rounding > > errors sum up - and there's nothing we can do about it! > > Well I did convert futex_wait to an absolute timeout based version in > the subsequent incremental patch. I think that is OK? it still has the rounding artifacts: using timer_list there is no way to do a precise long sleep based on many small sleeps. even if this means more work for you (i'm sorry about that!) i'm quite sure we should take Sebastien's hrtimers based implementation of futex_wait(), and use the nanosleep method to restart it. There's no point in further tweaking the imprecise approach: whenever some timeout needs to be restarted, it's a candidate for hrtimers. until then, glibc already handles timeouts and restarts it manually. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote: > On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote: > > Ingo Molnar <[EMAIL PROTECTED]> writes: > > > > > > the only correct approach is the use of hrtimers, and a patch exists for > > > that - see below. This has been included in -rt for quite some time. > > > > But isn't that bad for power management? You'll likely get more > > idle wakeups, won't you? > > Why so ? It comes more precise, but only once. When it's clustered around the jiffies interval then wakeups from multiple processes will be somewhat batched. With a precise wakeup you'll get wakeups all over the jiffies period, won't you? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote: > Ingo Molnar <[EMAIL PROTECTED]> writes: > > > > the only correct approach is the use of hrtimers, and a patch exists for > > that - see below. This has been included in -rt for quite some time. > > But isn't that bad for power management? You'll likely get more > idle wakeups, won't you? Why so ? It comes more precise, but only once. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
Ingo Molnar <[EMAIL PROTECTED]> writes: > > the only correct approach is the use of hrtimers, and a patch exists for > that - see below. This has been included in -rt for quite some time. But isn't that bad for power management? You'll likely get more idle wakeups, won't you? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 10:10:06AM +0100, Ingo Molnar wrote: > > * Roland McGrath <[EMAIL PROTECTED]> wrote: > > > I agree it should restart. But I don't think this is quite right in > > the timeout case. It will increase the total maximum real time spent > > arbitrarily by the amount of time elapsed in signal handlers. Other > > restartable, timed calls have to convert to an absolute timeout for > > the restart block (and convert back when doing the restart). > > i dont think we should try to do this. We should not and cannot do > anything about all of the artifacts that comes with the use of relative > timeouts and schedule_timeout(). > > basically, using jiffies here (which schedule_timeout() does) is > /fundamentally/ imprecise. If you get many interrupts, rounding errors > sum up - and there's nothing we can do about it! Well I did convert futex_wait to an absolute timeout based version in the subsequent incremental patch. I think that is OK? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 10:10:06AM +0100, Ingo Molnar wrote: * Roland McGrath [EMAIL PROTECTED] wrote: I agree it should restart. But I don't think this is quite right in the timeout case. It will increase the total maximum real time spent arbitrarily by the amount of time elapsed in signal handlers. Other restartable, timed calls have to convert to an absolute timeout for the restart block (and convert back when doing the restart). i dont think we should try to do this. We should not and cannot do anything about all of the artifacts that comes with the use of relative timeouts and schedule_timeout(). basically, using jiffies here (which schedule_timeout() does) is /fundamentally/ imprecise. If you get many interrupts, rounding errors sum up - and there's nothing we can do about it! Well I did convert futex_wait to an absolute timeout based version in the subsequent incremental patch. I think that is OK? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
Ingo Molnar [EMAIL PROTECTED] writes: the only correct approach is the use of hrtimers, and a patch exists for that - see below. This has been included in -rt for quite some time. But isn't that bad for power management? You'll likely get more idle wakeups, won't you? -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote: Ingo Molnar [EMAIL PROTECTED] writes: the only correct approach is the use of hrtimers, and a patch exists for that - see below. This has been included in -rt for quite some time. But isn't that bad for power management? You'll likely get more idle wakeups, won't you? Why so ? It comes more precise, but only once. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote: On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote: Ingo Molnar [EMAIL PROTECTED] writes: the only correct approach is the use of hrtimers, and a patch exists for that - see below. This has been included in -rt for quite some time. But isn't that bad for power management? You'll likely get more idle wakeups, won't you? Why so ? It comes more precise, but only once. When it's clustered around the jiffies interval then wakeups from multiple processes will be somewhat batched. With a precise wakeup you'll get wakeups all over the jiffies period, won't you? -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
* Nick Piggin [EMAIL PROTECTED] wrote: i dont think we should try to do this. We should not and cannot do anything about all of the artifacts that comes with the use of relative timeouts and schedule_timeout(). basically, using jiffies here (which schedule_timeout() does) is /fundamentally/ imprecise. If you get many interrupts, rounding errors sum up - and there's nothing we can do about it! Well I did convert futex_wait to an absolute timeout based version in the subsequent incremental patch. I think that is OK? it still has the rounding artifacts: using timer_list there is no way to do a precise long sleep based on many small sleeps. even if this means more work for you (i'm sorry about that!) i'm quite sure we should take Sebastien's hrtimers based implementation of futex_wait(), and use the nanosleep method to restart it. There's no point in further tweaking the imprecise approach: whenever some timeout needs to be restarted, it's a candidate for hrtimers. until then, glibc already handles timeouts and restarts it manually. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
* Andi Kleen [EMAIL PROTECTED] wrote: On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote: On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote: Ingo Molnar [EMAIL PROTECTED] writes: the only correct approach is the use of hrtimers, and a patch exists for that - see below. This has been included in -rt for quite some time. But isn't that bad for power management? You'll likely get more idle wakeups, won't you? Why so ? It comes more precise, but only once. When it's clustered around the jiffies interval then wakeups from multiple processes will be somewhat batched. With a precise wakeup you'll get wakeups all over the jiffies period, won't you? if HIGH_RES_TIMERS is disabled then that is what happens. But frankly, most futex waits are without timeouts - if an application cares about micro-effects like that then you are much better off not using a per-futex timeout anyway. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, 2007-03-12 at 12:02 +0100, Ingo Molnar wrote: Well I did convert futex_wait to an absolute timeout based version in the subsequent incremental patch. I think that is OK? it still has the rounding artifacts: using timer_list there is no way to do a precise long sleep based on many small sleeps. even if this means more work for you (i'm sorry about that!) i'm quite sure we should take Sebastien's hrtimers based implementation of futex_wait(), and use the nanosleep method to restart it. There's no point in further tweaking the imprecise approach: whenever some timeout needs to be restarted, it's a candidate for hrtimers. until then, glibc already handles timeouts and restarts it manually. This also allows us to add a seperate absolute time bases futex op, which allows to remove the conversion of abstime to reltime in glibc. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 12:02:04PM +0100, Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: i dont think we should try to do this. We should not and cannot do anything about all of the artifacts that comes with the use of relative timeouts and schedule_timeout(). basically, using jiffies here (which schedule_timeout() does) is /fundamentally/ imprecise. If you get many interrupts, rounding errors sum up - and there's nothing we can do about it! Well I did convert futex_wait to an absolute timeout based version in the subsequent incremental patch. I think that is OK? it still has the rounding artifacts: using timer_list there is no way to do a precise long sleep based on many small sleeps. OK but that is nothing to do with my patch, but the original futex_wait implementation. even if this means more work for you (i'm sorry about that!) i'm quite sure we should take Sebastien's hrtimers based implementation of futex_wait(), and use the nanosleep method to restart it. There's no point in further tweaking the imprecise approach: whenever some timeout needs to be restarted, it's a candidate for hrtimers. Absolute timeout is needed, sure. But once that is done, hrtimers does not fix a bug, does it? until then, glibc already handles timeouts and restarts it manually. It isn't timeout handling that is buggy, but EINTR behaviour. And glibc does not handle that here. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
* Nick Piggin [EMAIL PROTECTED] wrote: even if this means more work for you (i'm sorry about that!) i'm quite sure we should take Sebastien's hrtimers based implementation of futex_wait(), and use the nanosleep method to restart it. There's no point in further tweaking the imprecise approach: whenever some timeout needs to be restarted, it's a candidate for hrtimers. Absolute timeout is needed, sure. But once that is done, hrtimers does not fix a bug, does it? the issue is this: your fix reduces the effects of the bug but it is still fundamentally incomplete because of the use of timer_list. So instead of trying to fix the bug the wrong way, please try to fix it the right way, ontop of an already existing and tested patch, ok? That also enables the other neat stuff Thomas talked about. until then, glibc already handles timeouts and restarts it manually. It isn't timeout handling that is buggy, but EINTR behaviour. And glibc does not handle that here. hm. I'm wondering how this wasnt noticed sooner - this futex_wait behavior has been there for like forever. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
if HIGH_RES_TIMERS is disabled then that is what happens. But frankly, disabled? I would expect it (= more wakeups) when hrtimers are enabled. most futex waits are without timeouts - if an application cares about micro-effects like that then you are much better off not using a per-futex timeout anywa That sounds like you're arguing for not using hrtimers here because the applications shouldn't depend on precise timeouts here anyways?!? Anyways when you convert more kernel timeouts to hrtimers you should probably add some kind of batching facility that can be globally configured with a sysctl or similar. Then at least laptops (and possibly servers) can opt for more power saving again. For the futexes alone it probably won't matter too much agreed, but I see a trend to more hr. I also liked the idea (stolen from another popular OS) that a application can tell the OS how precise it wants its wakeups to be. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
* Andi Kleen [EMAIL PROTECTED] wrote: if HIGH_RES_TIMERS is disabled then that is what happens. But frankly, disabled? I would expect it (= more wakeups) when hrtimers are enabled. i mean the groupping of timer expiries happens automatically when high-res is disabled. When high-res is asked for then, duh, it's enabled and you get precise timeouts ;-) most futex waits are without timeouts - if an application cares about micro-effects like that then you are much better off not using a per-futex timeout anywa That sounds like you're arguing for not using hrtimers here because the applications shouldn't depend on precise timeouts here anyways?!? I was talking about the micro-effect of grouping timer expiries. Anyways when you convert more kernel timeouts to hrtimers you should probably add some kind of batching facility that can be globally configured with a sysctl or similar. Then at least laptops (and possibly servers) can opt for more power saving again. For the futexes alone it probably won't matter too much agreed, but I see a trend to more hr. yeah, we had that in earlier versions, it's trivial - nobody used it. So i'll wait for the actual measurements and a patch (or i can do the patch too, if someone comes up with the measurements). I've added /proc/timer_stats and /proc/timer_info for exactly such reasons. ( note that we've added the facility for even more imprecise sleeps to the timer_list APIs - but for hrtimers it's a lot less clear-cut case. ) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 12:19:58PM +0100, Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: even if this means more work for you (i'm sorry about that!) i'm quite sure we should take Sebastien's hrtimers based implementation of futex_wait(), and use the nanosleep method to restart it. There's no point in further tweaking the imprecise approach: whenever some timeout needs to be restarted, it's a candidate for hrtimers. Absolute timeout is needed, sure. But once that is done, hrtimers does not fix a bug, does it? the issue is this: your fix reduces the effects of the bug but it is still fundamentally incomplete because of the use of timer_list. So But using schedule_timeout is not a bug. Userspace timeouts are always defined to be at least. instead of trying to fix the bug the wrong way, please try to fix it the right way, ontop of an already existing and tested patch, ok? That also enables the other neat stuff Thomas talked about. Well that's nice, but I have a bugfix here which probably needs to get backported to stable kernels and distro kernels. It should be just as easy to rebase the hrtimer patch on top of my fix. Considering that you've had it for a year, I don't think it needs to be added right before my fix. until then, glibc already handles timeouts and restarts it manually. It isn't timeout handling that is buggy, but EINTR behaviour. And glibc does not handle that here. hm. I'm wondering how this wasnt noticed sooner - this futex_wait behavior has been there for like forever. People ignore LTP test failures, and programs probably try to avoid exercising the nuances of the unix signal API, I guess. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
* Nick Piggin [EMAIL PROTECTED] wrote: the issue is this: your fix reduces the effects of the bug but it is still fundamentally incomplete because of the use of timer_list. So But using schedule_timeout is not a bug. Userspace timeouts are always defined to be at least. but what you are adding isnt a plain schedule_timeout(), it is a restart block handling loop. And for those restart blocks that relate to timeouts, we only use hrtimers. I am not making this up to annoy you: take a look at all the current restart block handlers - they are hrtimer based, for exactly this reason. instead of trying to fix the bug the wrong way, please try to fix it the right way, ontop of an already existing and tested patch, ok? That also enables the other neat stuff Thomas talked about. Well that's nice, but I have a bugfix here which probably needs to get backported to stable kernels and distro kernels. yes but your patch already exists for them which they can pick up. really, this is a common Linux principle: fix it completely and fix it the right way. You are applying it yourself on a daily basis when having the maintainer hat on =B-) It should be just as easy to rebase the hrtimer patch on top of my fix. Considering that you've had it for a year, I don't think it needs to be added right before my fix. your latest patch looks quite kludgy, exactly due to the issues that were mentioned. hm. I'm wondering how this wasnt noticed sooner - this futex_wait behavior has been there for like forever. People ignore LTP test failures, and programs probably try to avoid exercising the nuances of the unix signal API, I guess. then there's no rush and lets do this the right way, ok? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 12:38:29PM +0100, Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: the issue is this: your fix reduces the effects of the bug but it is still fundamentally incomplete because of the use of timer_list. So But using schedule_timeout is not a bug. Userspace timeouts are always defined to be at least. but what you are adding isnt a plain schedule_timeout(), it is a restart block handling loop. And for those restart blocks that relate to timeouts, we only use hrtimers. I am not making this up to annoy you: take a look at all the current restart block handlers - they are hrtimer based, for exactly this reason. So why do you say it is fundamentally incomplete? instead of trying to fix the bug the wrong way, please try to fix it the right way, ontop of an already existing and tested patch, ok? That also enables the other neat stuff Thomas talked about. Well that's nice, but I have a bugfix here which probably needs to get backported to stable kernels and distro kernels. yes but your patch already exists for them which they can pick up. really, this is a common Linux principle: fix it completely and fix it the right way. You are applying it yourself on a daily basis when having the maintainer hat on =B-) I still didn't get anything wrong pointed out with the patch, though. I'm not arguing against using hrtimers here to fix it the right way. It should be just as easy to rebase the hrtimer patch on top of my fix. Considering that you've had it for a year, I don't think it needs to be added right before my fix. your latest patch looks quite kludgy, exactly due to the issues that were mentioned. I don't see what is kludgy, unless you consider converting to and from absolute timeouts. But the userspace API is relative time based, so hrtimers doesn't change that. hm. I'm wondering how this wasnt noticed sooner - this futex_wait behavior has been there for like forever. People ignore LTP test failures, and programs probably try to avoid exercising the nuances of the unix signal API, I guess. then there's no rush and lets do this the right way, ok? There is no rush to use hrtimers. I would have thought it fairly important to actually reach correctness, though. We're not talking about completely changing the design of something such that it will take a lot of work to fix it properly. If that were the issue, then I would consider the hrtimer conversion as part of the fix. And if you talk about doing it the right way, then I don't think it is strictly the right way to reimplement the function, including known bugs, to be slightly more efficient, and *then* fixing those bugs. I'd actually consider it better to fix the bugs first, not only because of the backport issue, but because it generally makes it easier to track the injection and removal points of bugs in the history. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
* Nick Piggin [EMAIL PROTECTED] wrote: the issue is this: your fix reduces the effects of the bug but it is still fundamentally incomplete because of the use of timer_list. So But using schedule_timeout is not a bug. Userspace timeouts are always defined to be at least. but what you are adding isnt a plain schedule_timeout(), it is a restart block handling loop. And for those restart blocks that relate to timeouts, we only use hrtimers. I am not making this up to annoy you: take a look at all the current restart block handlers - they are hrtimer based, for exactly this reason. So why do you say it is fundamentally incomplete? because i misread your last patch :-) I thought it still has a window for inaccuracy, but you are right: it should be at most 1 jiffy inaccurate, no matter how many times we restart. still ... the hrtimers patch has been submitted to lkml before yours, and has been tested extensively, so why go the extra side-jump prolonging the jiffies sleep method? The LTP failure has been there since the inception of the futex code i suspect. Going this way also enables the addressing of a more pressing need: the elimination of glibc's forced use of relative futex timeouts. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 01:21:03PM +0100, Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: the issue is this: your fix reduces the effects of the bug but it is still fundamentally incomplete because of the use of timer_list. So But using schedule_timeout is not a bug. Userspace timeouts are always defined to be at least. but what you are adding isnt a plain schedule_timeout(), it is a restart block handling loop. And for those restart blocks that relate to timeouts, we only use hrtimers. I am not making this up to annoy you: take a look at all the current restart block handlers - they are hrtimer based, for exactly this reason. So why do you say it is fundamentally incomplete? because i misread your last patch :-) I thought it still has a window for inaccuracy, but you are right: it should be at most 1 jiffy inaccurate, no matter how many times we restart. OK, no problem. still ... the hrtimers patch has been submitted to lkml before yours, and has been tested extensively, so why go the extra side-jump prolonging the jiffies sleep method? The LTP failure has been there since the inception of the futex code i suspect. Going this way also enables the addressing of a more pressing need: the elimination of glibc's forced use of relative futex timeouts. I guess my arguments are that my patch fixes a bug, which gives it a higher priority (being a userspace API bug, perhaps even 2.6.21); and that it will want to be backported while the hrtimer patch will not, so including the hrtimer patch first means 2 different patches to fix the same bug. I'm not trying to make life harder for the hrtimer patch. I will even volunteer to forward port it on top of the restart fix, if that is an issue. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 11:58:26AM +0100, Andi Kleen wrote: On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote: On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote: Ingo Molnar [EMAIL PROTECTED] writes: the only correct approach is the use of hrtimers, and a patch exists for that - see below. This has been included in -rt for quite some time. But isn't that bad for power management? You'll likely get more idle wakeups, won't you? Why so ? It comes more precise, but only once. When it's clustered around the jiffies interval then wakeups from multiple processes will be somewhat batched. With a precise wakeup you'll get wakeups all over the jiffies period, won't you? What we probably need in the long-term, and not just for high precision wakeups, is we need a way for waiters (either in the kernel or in userspace) to specify a desired precision in their timers. Is it, wake me up in a second, exactly, or wake me up in a second, plus or minus 10ms? (or 50ms? or 100ms?). This becomes especially important if we want the tickless code to really shine as far as power management is concerned. Unfortunately, the POSIX timer abstraction doesn't give this kind of flexibility easily, so it's going to be a while before we see significant userspace adoption of such a kernel feature, but I think it's something that would be still worthwhile to add. Regards, - Ted - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
* Theodore Tso [EMAIL PROTECTED] wrote: What we probably need in the long-term, and not just for high precision wakeups, is we need a way for waiters (either in the kernel or in userspace) to specify a desired precision in their timers. Is it, wake me up in a second, exactly, or wake me up in a second, plus or minus 10ms? (or 50ms? or 100ms?). such a facility exists already, see round_jiffies() and round_jiffies_relative(). There's some short blurb about it at: http://kernelnewbies.org/LinuxChanges#head-513ceda14f5d8cf5b8a7c81d7e3821543141ecb0 This becomes especially important if we want the tickless code to really shine as far as power management is concerned. [...] yes. That's why we also implemented /proc/timer_stat, and this was measured and a few higher-frequency fuzzy waiters were converted to use round_jiffies(). Some other waiters were fixed in user-space. It's all dependent on actual measurements and circumstances. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
On Mon, Mar 12, 2007 at 10:12:14AM -0400, Theodore Tso wrote: On Mon, Mar 12, 2007 at 11:58:26AM +0100, Andi Kleen wrote: On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote: On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote: Ingo Molnar [EMAIL PROTECTED] writes: the only correct approach is the use of hrtimers, and a patch exists for that - see below. This has been included in -rt for quite some time. But isn't that bad for power management? You'll likely get more idle wakeups, won't you? Why so ? It comes more precise, but only once. When it's clustered around the jiffies interval then wakeups from multiple processes will be somewhat batched. With a precise wakeup you'll get wakeups all over the jiffies period, won't you? What we probably need in the long-term, and not just for high precision wakeups, is we need a way for waiters (either in the kernel or in userspace) to specify a desired precision in their timers. Is it, wake me up in a second, exactly, or wake me up in a second, plus or minus 10ms? (or 50ms? or 100ms?). Would this work, or will it just create more confusion for the API user? I mean, all sleeps can only guarantee no less than. Would it be enough for a binary (exact as possible / relaxed if needed) flag? Or perhaps ternary (exact/relaxed/batched), where relaxed could add an extra jiffy or so, and batched is really relaxed that may delay up to double the value of the timeout. This becomes especially important if we want the tickless code to really shine as far as power management is concerned. Unfortunately, the POSIX timer abstraction doesn't give this kind of flexibility easily, so it's going to be a while before we see significant userspace adoption of such a kernel feature, but I think it's something that would be still worthwhile to add. But given that we know most userspace API timeouts are broadly just an equal to or greater, then we could add another timeout flag to specify it is a userspace timeout, and make that controllable by sysctl. Sure it isn't ideal, but for those who really want power / hypervisor savings, it could be useful. BTW. my futex man page says timeout's contents describe the maximum duration of the wait. Surely that should be *minimum*? Michael cc'ed. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] change futex_wait() to hrtimers
This becomes especially important if we want the tickless code to really shine as far as power management is concerned. Unfortunately, the POSIX timer abstraction doesn't give this kind of flexibility easily, so it's going to be a while before we see significant userspace adoption of such a kernel feature, but I think it's something that would be still worthwhile to add. I suspect it would be overkill to specify that on every sleep operation. 99% of applications won't care and for the 1% leftover a global per process setting should be fine. I think a single prctl() and a global sysctl as default would be enough. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/