Re: [patch] change futex_wait() to hrtimers

2007-03-14 Thread linux
> BTW. my futex man page says timeout's contents "describe the maximum duration
> of the wait". Surely that should be *minimum*? Michael cc'ed.

Er, the intent of the wording is to say "futex will wait until uaddr
no longer contains val, or the timeout expires, whichever happens first".


One option for selecting different clock resolutions is to use the
clockid_t from the POSIX clock_gettime() family.  That is, specify the
clock that a wait uses, and then have a separate mechanism for turning
a resolution requirement into a clockid_t.

(And there can be default clocks for interfaces that don't specify one
explicitly.)

Although clockid_t is pretty generic, it's biased toward an enumerated
list of clocks rather than a continuous resolution.  Fortunately,
that seems to match the implementation ideas.  The question is how
much the timeout gets rounded, and the choices are currently jiffies
or microseconds.

A related option may be whether rounding down is acceptable.  For some
applications (periodic polling for events), it's fine.  For others,
it's not.  Thus, while it's okay to specify such clocks explicitly,
it'd probably be a good idea to forbid selecting them as the default
for interfaces that don't specify a clock explicitly.

I had some code that suffered 1 ms buzz-loops on Solaris because poll(2)
would round the timeout interval down, but the loop calling it would
explicitly check whether the timeout had expired using gettimeofday()
and would keep re-invoking poll(pollfds, npollfds, 1) until the timeout
really did expire.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-14 Thread linux
 BTW. my futex man page says timeout's contents describe the maximum duration
 of the wait. Surely that should be *minimum*? Michael cc'ed.

Er, the intent of the wording is to say futex will wait until uaddr
no longer contains val, or the timeout expires, whichever happens first.


One option for selecting different clock resolutions is to use the
clockid_t from the POSIX clock_gettime() family.  That is, specify the
clock that a wait uses, and then have a separate mechanism for turning
a resolution requirement into a clockid_t.

(And there can be default clocks for interfaces that don't specify one
explicitly.)

Although clockid_t is pretty generic, it's biased toward an enumerated
list of clocks rather than a continuous resolution.  Fortunately,
that seems to match the implementation ideas.  The question is how
much the timeout gets rounded, and the choices are currently jiffies
or microseconds.

A related option may be whether rounding down is acceptable.  For some
applications (periodic polling for events), it's fine.  For others,
it's not.  Thus, while it's okay to specify such clocks explicitly,
it'd probably be a good idea to forbid selecting them as the default
for interfaces that don't specify a clock explicitly.

I had some code that suffered 1 ms buzz-loops on Solaris because poll(2)
would round the timeout interval down, but the loop calling it would
explicitly check whether the timeout had expired using gettimeofday()
and would keep re-invoking poll(pollfds, npollfds, 1) until the timeout
really did expire.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Theodore Tso <[EMAIL PROTECTED]> wrote:

> What we probably need in the long-term, and not just for high 
> precision wakeups, is we need a way for waiters (either in the kernel 
> or in userspace) to specify a desired precision in their timers.  Is 
> it, "wake me up in a second, exactly", or "wake me up in a second, 
> plus or minus 10ms"?  (or 50ms?  or 100ms?).

such a facility exists already, see round_jiffies() and 
round_jiffies_relative(). There's some short blurb about it at:

  
http://kernelnewbies.org/LinuxChanges#head-513ceda14f5d8cf5b8a7c81d7e3821543141ecb0

> This becomes especially important if we want the tickless code to 
> really shine as far as power management is concerned. [...]

yes. That's why we also implemented /proc/timer_stat, and this was 
measured and a few higher-frequency fuzzy waiters were converted to use 
round_jiffies(). Some other waiters were fixed in user-space. It's all 
dependent on actual measurements and circumstances.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin
On Mon, Mar 12, 2007 at 10:12:14AM -0400, Theodore Tso wrote:
> On Mon, Mar 12, 2007 at 11:58:26AM +0100, Andi Kleen wrote:
> > On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote:
> > > On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote:
> > > > Ingo Molnar <[EMAIL PROTECTED]> writes:
> > > > > 
> > > > > the only correct approach is the use of hrtimers, and a patch exists 
> > > > > for 
> > > > > that - see below. This has been included in -rt for quite some time.
> > > > 
> > > > But isn't that bad for power management? You'll likely get more
> > > > idle wakeups, won't you?
> > > 
> > > Why so ? It comes more precise, but only once.
> > 
> > When it's clustered around the jiffies interval then wakeups from
> > multiple processes will be somewhat batched. With a precise wakeup you'll
> > get wakeups all over the jiffies period, won't you?
> 
> What we probably need in the long-term, and not just for high
> precision wakeups, is we need a way for waiters (either in the kernel
> or in userspace) to specify a desired precision in their timers.  Is
> it, "wake me up in a second, exactly", or "wake me up in a second,
> plus or minus 10ms"?   (or 50ms?  or 100ms?).

Would this work, or will it just create more confusion for the API user?
I mean, all sleeps can only guarantee "no less than".

Would it be enough for a binary (exact as possible / relaxed if needed)
flag? Or perhaps ternary (exact/relaxed/batched), where relaxed could
add an extra jiffy or so, and batched is really relaxed that may delay
up to double the value of the timeout.

> This becomes especially important if we want the tickless code to
> really shine as far as power management is concerned.  Unfortunately,
> the POSIX timer abstraction doesn't give this kind of flexibility
> easily, so it's going to be a while before we see significant
> userspace adoption of such a kernel feature, but I think it's
> something that would be still worthwhile to add.

But given that we know most userspace API timeouts are broadly just an
"equal to or greater", then we could add another timeout flag to specify
it is a userspace timeout, and make that controllable by sysctl.

Sure it isn't ideal, but for those who really want power / hypervisor
savings, it could be useful.

BTW. my futex man page says timeout's contents "describe the maximum duration
of the wait". Surely that should be *minimum*? Michael cc'ed.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Andi Kleen
> This becomes especially important if we want the tickless code to
> really shine as far as power management is concerned.  Unfortunately,
> the POSIX timer abstraction doesn't give this kind of flexibility
> easily, so it's going to be a while before we see significant
> userspace adoption of such a kernel feature, but I think it's
> something that would be still worthwhile to add.

I suspect it would be overkill to specify that on every sleep operation.
99% of applications won't care and for the 1% leftover a global per 
process setting should be fine.
I think a single prctl() and a global sysctl as default would be enough.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Theodore Tso
On Mon, Mar 12, 2007 at 11:58:26AM +0100, Andi Kleen wrote:
> On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote:
> > On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote:
> > > Ingo Molnar <[EMAIL PROTECTED]> writes:
> > > > 
> > > > the only correct approach is the use of hrtimers, and a patch exists 
> > > > for 
> > > > that - see below. This has been included in -rt for quite some time.
> > > 
> > > But isn't that bad for power management? You'll likely get more
> > > idle wakeups, won't you?
> > 
> > Why so ? It comes more precise, but only once.
> 
> When it's clustered around the jiffies interval then wakeups from
> multiple processes will be somewhat batched. With a precise wakeup you'll
> get wakeups all over the jiffies period, won't you?

What we probably need in the long-term, and not just for high
precision wakeups, is we need a way for waiters (either in the kernel
or in userspace) to specify a desired precision in their timers.  Is
it, "wake me up in a second, exactly", or "wake me up in a second,
plus or minus 10ms"?   (or 50ms?  or 100ms?).

This becomes especially important if we want the tickless code to
really shine as far as power management is concerned.  Unfortunately,
the POSIX timer abstraction doesn't give this kind of flexibility
easily, so it's going to be a while before we see significant
userspace adoption of such a kernel feature, but I think it's
something that would be still worthwhile to add.

Regards,

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin
On Mon, Mar 12, 2007 at 01:21:03PM +0100, Ingo Molnar wrote:
> 
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > > > > the issue is this: your fix reduces the effects of the bug but 
> > > > > it is still fundamentally incomplete because of the use of 
> > > > > timer_list. So
> > > > 
> > > > But using schedule_timeout is not a bug. Userspace timeouts are 
> > > > always defined to be "at least".
> > > 
> > > but what you are adding isnt a plain schedule_timeout(), it is a 
> > > restart block handling loop. And for those restart blocks that 
> > > relate to timeouts, we only use hrtimers. I am not making this up to 
> > > annoy you: take a look at all the current restart block handlers - 
> > > they are hrtimer based, for exactly this reason.
> > 
> > So why do you say it is fundamentally incomplete?
> 
> because i misread your last patch :-) I thought it still has a window 
> for inaccuracy, but you are right: it should be at most 1 jiffy 
> inaccurate, no matter how many times we restart.

OK, no problem.

> still ... the hrtimers patch has been submitted to lkml before yours, 
> and has been tested extensively, so why go the extra side-jump 
> prolonging the jiffies sleep method? The LTP failure has been there 
> since the inception of the futex code i suspect. Going this way also 
> enables the addressing of a more pressing need: the elimination of 
> glibc's forced use of relative futex timeouts.

I guess my arguments are that my patch fixes a bug, which gives it a
higher priority (being a userspace API bug, perhaps even 2.6.21); and
that it will want to be backported while the hrtimer patch will not, so
including the hrtimer patch first means 2 different patches to fix the
same bug.

I'm not trying to make life harder for the hrtimer patch. I will even
volunteer to forward port it on top of the restart fix, if that is an
issue.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> > > > the issue is this: your fix reduces the effects of the bug but 
> > > > it is still fundamentally incomplete because of the use of 
> > > > timer_list. So
> > > 
> > > But using schedule_timeout is not a bug. Userspace timeouts are 
> > > always defined to be "at least".
> > 
> > but what you are adding isnt a plain schedule_timeout(), it is a 
> > restart block handling loop. And for those restart blocks that 
> > relate to timeouts, we only use hrtimers. I am not making this up to 
> > annoy you: take a look at all the current restart block handlers - 
> > they are hrtimer based, for exactly this reason.
> 
> So why do you say it is fundamentally incomplete?

because i misread your last patch :-) I thought it still has a window 
for inaccuracy, but you are right: it should be at most 1 jiffy 
inaccurate, no matter how many times we restart.

still ... the hrtimers patch has been submitted to lkml before yours, 
and has been tested extensively, so why go the extra side-jump 
prolonging the jiffies sleep method? The LTP failure has been there 
since the inception of the futex code i suspect. Going this way also 
enables the addressing of a more pressing need: the elimination of 
glibc's forced use of relative futex timeouts.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin
On Mon, Mar 12, 2007 at 12:38:29PM +0100, Ingo Molnar wrote:
> 
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > > the issue is this: your fix reduces the effects of the bug but it is 
> > > still fundamentally incomplete because of the use of timer_list. So
> > 
> > But using schedule_timeout is not a bug. Userspace timeouts are always 
> > defined to be "at least".
> 
> but what you are adding isnt a plain schedule_timeout(), it is a restart 
> block handling loop. And for those restart blocks that relate to 
> timeouts, we only use hrtimers. I am not making this up to annoy you: 
> take a look at all the current restart block handlers - they are hrtimer 
> based, for exactly this reason.

So why do you say it is fundamentally incomplete?

> > > instead of trying to fix the bug the wrong way, please try to fix it 
> > > the right way, ontop of an already existing and tested patch, ok? 
> > > That also enables the other neat stuff Thomas talked about.
> > 
> > Well that's nice, but I have a bugfix here which probably needs to get 
> > backported to stable kernels and distro kernels.
> 
> yes but your patch already exists for them which they can pick up.
> 
> really, this is a common Linux principle: fix it completely and fix it 
> the right way. You are applying it yourself on a daily basis when having 
> the maintainer hat on =B-)

I still didn't get anything wrong pointed out with the patch, though.

I'm not arguing against using hrtimers here to fix it the "right way".

> > It should be just as easy to rebase the hrtimer patch on top of my 
> > fix. Considering that you've had it for a year, I don't think it needs 
> > to be added right before my fix.
> 
> your latest patch looks quite kludgy, exactly due to the issues that 
> were mentioned.

I don't see what is kludgy, unless you consider converting to and from
absolute timeouts. But the userspace API is relative time based, so
hrtimers doesn't change that.

> > > hm. I'm wondering how this wasnt noticed sooner - this futex_wait 
> > > behavior has been there for like forever.
> > 
> > People ignore LTP test failures, and programs probably try to avoid 
> > exercising the nuances of the unix signal API, I guess.
> 
> then there's no rush and lets do this the right way, ok?

There is no rush to use hrtimers. I would have thought it fairly important
to actually reach correctness, though. We're not talking about completely
changing the design of something such that it will take a lot of work to
"fix it properly". If that were the issue, then I would consider the
hrtimer conversion as part of the fix.

And if you talk about doing it the right way, then I don't think it is
strictly the right way to reimplement the function, including known bugs,
to be slightly more efficient, and *then* fixing those bugs. I'd actually
consider it better to fix the bugs first, not only because of the backport
issue, but because it generally makes it easier to track the injection and
removal points of bugs in the history.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> > the issue is this: your fix reduces the effects of the bug but it is 
> > still fundamentally incomplete because of the use of timer_list. So
> 
> But using schedule_timeout is not a bug. Userspace timeouts are always 
> defined to be "at least".

but what you are adding isnt a plain schedule_timeout(), it is a restart 
block handling loop. And for those restart blocks that relate to 
timeouts, we only use hrtimers. I am not making this up to annoy you: 
take a look at all the current restart block handlers - they are hrtimer 
based, for exactly this reason.

> > instead of trying to fix the bug the wrong way, please try to fix it 
> > the right way, ontop of an already existing and tested patch, ok? 
> > That also enables the other neat stuff Thomas talked about.
> 
> Well that's nice, but I have a bugfix here which probably needs to get 
> backported to stable kernels and distro kernels.

yes but your patch already exists for them which they can pick up.

really, this is a common Linux principle: fix it completely and fix it 
the right way. You are applying it yourself on a daily basis when having 
the maintainer hat on =B-)

> It should be just as easy to rebase the hrtimer patch on top of my 
> fix. Considering that you've had it for a year, I don't think it needs 
> to be added right before my fix.

your latest patch looks quite kludgy, exactly due to the issues that 
were mentioned.

> > hm. I'm wondering how this wasnt noticed sooner - this futex_wait 
> > behavior has been there for like forever.
> 
> People ignore LTP test failures, and programs probably try to avoid 
> exercising the nuances of the unix signal API, I guess.

then there's no rush and lets do this the right way, ok?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> > if HIGH_RES_TIMERS is disabled then that is what happens. But 
> > frankly,
> 
> disabled? I would expect it (= more wakeups) when hrtimers are 
> enabled.

i mean the groupping of timer expiries happens automatically when 
high-res is disabled. When high-res is asked for then, duh, it's enabled 
and you get precise timeouts ;-)

> > most futex waits are without timeouts - if an application cares 
> > about micro-effects like that then you are much better off not using 
> > a per-futex timeout anywa
> 
> That sounds like you're arguing for not using hrtimers here because 
> the applications shouldn't depend on precise timeouts here anyways?!?

I was talking about the "micro-effect" of grouping timer expiries.

> Anyways when you convert more kernel timeouts to hrtimers you should 
> probably add some kind of batching facility that can be globally 
> configured with a sysctl or similar. Then at least laptops (and 
> possibly servers) can opt for more power saving again. For the futexes 
> alone it probably won't matter too much agreed, but I see a trend to 
> more hr.

yeah, we had that in earlier versions, it's trivial - nobody used it. So 
i'll wait for the actual measurements and a patch (or i can do the patch 
too, if someone comes up with the measurements). I've added 
/proc/timer_stats and /proc/timer_info for exactly such reasons.

( note that we've added the facility for even more imprecise sleeps to 
  the timer_list APIs - but for hrtimers it's a lot less clear-cut case. )

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin
On Mon, Mar 12, 2007 at 12:19:58PM +0100, Ingo Molnar wrote:
> 
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > > even if this means more work for you (i'm sorry about that!) i'm 
> > > quite sure we should take Sebastien's hrtimers based implementation 
> > > of futex_wait(), and use the nanosleep method to restart it. There's 
> > > no point in further tweaking the imprecise approach: whenever some 
> > > timeout needs to be restarted, it's a candidate for hrtimers.
> > 
> > Absolute timeout is needed, sure. But once that is done, hrtimers does 
> > not fix a bug, does it?
> 
> the issue is this: your fix reduces the effects of the bug but it is 
> still fundamentally incomplete because of the use of timer_list. So 

But using schedule_timeout is not a bug. Userspace timeouts are always
defined to be "at least".

> instead of trying to fix the bug the wrong way, please try to fix it the 
> right way, ontop of an already existing and tested patch, ok? That also 
> enables the other neat stuff Thomas talked about.

Well that's nice, but I have a bugfix here which probably needs to
get backported to stable kernels and distro kernels.

It should be just as easy to rebase the hrtimer patch on top of my
fix. Considering that you've had it for a year, I don't think it 
needs to be added right before my fix.

> > > until then, glibc already handles timeouts and restarts it manually.
> > 
> > It isn't timeout handling that is buggy, but EINTR behaviour. And 
> > glibc does not handle that here.
> 
> hm. I'm wondering how this wasnt noticed sooner - this futex_wait 
> behavior has been there for like forever.

People ignore LTP test failures, and programs probably try to avoid
exercising the nuances of the unix signal API, I guess.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Andi Kleen
> if HIGH_RES_TIMERS is disabled then that is what happens. But frankly, 

disabled? I would expect it (= more wakeups) when hrtimers are enabled.

> most futex waits are without timeouts - if an application cares about 
> micro-effects like that then you are much better off not using a 
> per-futex timeout anywa

That sounds like you're arguing for not using hrtimers here because
the applications shouldn't depend on precise timeouts here anyways?!? 

Anyways when you convert more kernel timeouts to hrtimers you should
probably add some kind of batching facility that can be globally
configured with a sysctl or similar. Then at least laptops (and possibly
servers) can opt for more power saving again. For the futexes alone
it probably won't matter too much agreed, but I see a trend to more hr.

I also liked the idea (stolen from another popular OS) that a application
can tell the OS how precise it wants its wakeups to be.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> > even if this means more work for you (i'm sorry about that!) i'm 
> > quite sure we should take Sebastien's hrtimers based implementation 
> > of futex_wait(), and use the nanosleep method to restart it. There's 
> > no point in further tweaking the imprecise approach: whenever some 
> > timeout needs to be restarted, it's a candidate for hrtimers.
> 
> Absolute timeout is needed, sure. But once that is done, hrtimers does 
> not fix a bug, does it?

the issue is this: your fix reduces the effects of the bug but it is 
still fundamentally incomplete because of the use of timer_list. So 
instead of trying to fix the bug the wrong way, please try to fix it the 
right way, ontop of an already existing and tested patch, ok? That also 
enables the other neat stuff Thomas talked about.

> > until then, glibc already handles timeouts and restarts it manually.
> 
> It isn't timeout handling that is buggy, but EINTR behaviour. And 
> glibc does not handle that here.

hm. I'm wondering how this wasnt noticed sooner - this futex_wait 
behavior has been there for like forever.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Thomas Gleixner
On Mon, 2007-03-12 at 12:02 +0100, Ingo Molnar wrote:
> > Well I did convert futex_wait to an absolute timeout based version in 
> > the subsequent incremental patch. I think that is OK?
> 
> it still has the rounding artifacts: using timer_list there is no way to 
> do a precise long sleep based on many small sleeps.
> 
> even if this means more work for you (i'm sorry about that!) i'm quite 
> sure we should take Sebastien's hrtimers based implementation of 
> futex_wait(), and use the nanosleep method to restart it. There's no 
> point in further tweaking the imprecise approach: whenever some timeout 
> needs to be restarted, it's a candidate for hrtimers.
> 
> until then, glibc already handles timeouts and restarts it manually.

This also allows us to add a seperate absolute time bases futex op,
which allows to remove the conversion of abstime to reltime in glibc.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin
On Mon, Mar 12, 2007 at 12:02:04PM +0100, Ingo Molnar wrote:
> 
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > > i dont think we should try to do this. We should not and cannot do 
> > > anything about all of the artifacts that comes with the use of 
> > > relative timeouts and schedule_timeout().
> > > 
> > > basically, using jiffies here (which schedule_timeout() does) is 
> > > /fundamentally/ imprecise. If you get many interrupts, rounding 
> > > errors sum up - and there's nothing we can do about it!
> > 
> > Well I did convert futex_wait to an absolute timeout based version in 
> > the subsequent incremental patch. I think that is OK?
> 
> it still has the rounding artifacts: using timer_list there is no way to 
> do a precise long sleep based on many small sleeps.

OK but that is nothing to do with my patch, but the original futex_wait
implementation.

> even if this means more work for you (i'm sorry about that!) i'm quite 
> sure we should take Sebastien's hrtimers based implementation of 
> futex_wait(), and use the nanosleep method to restart it. There's no 
> point in further tweaking the imprecise approach: whenever some timeout 
> needs to be restarted, it's a candidate for hrtimers.

Absolute timeout is needed, sure. But once that is done, hrtimers does
not fix a bug, does it?

> 
> until then, glibc already handles timeouts and restarts it manually.

It isn't timeout handling that is buggy, but EINTR behaviour. And
glibc does not handle that here.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote:
> > On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote:
> > > Ingo Molnar <[EMAIL PROTECTED]> writes:
> > > > 
> > > > the only correct approach is the use of hrtimers, and a patch exists 
> > > > for 
> > > > that - see below. This has been included in -rt for quite some time.
> > > 
> > > But isn't that bad for power management? You'll likely get more
> > > idle wakeups, won't you?
> > 
> > Why so ? It comes more precise, but only once.
> 
> When it's clustered around the jiffies interval then wakeups from 
> multiple processes will be somewhat batched. With a precise wakeup 
> you'll get wakeups all over the jiffies period, won't you?

if HIGH_RES_TIMERS is disabled then that is what happens. But frankly, 
most futex waits are without timeouts - if an application cares about 
micro-effects like that then you are much better off not using a 
per-futex timeout anyway.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> > i dont think we should try to do this. We should not and cannot do 
> > anything about all of the artifacts that comes with the use of 
> > relative timeouts and schedule_timeout().
> > 
> > basically, using jiffies here (which schedule_timeout() does) is 
> > /fundamentally/ imprecise. If you get many interrupts, rounding 
> > errors sum up - and there's nothing we can do about it!
> 
> Well I did convert futex_wait to an absolute timeout based version in 
> the subsequent incremental patch. I think that is OK?

it still has the rounding artifacts: using timer_list there is no way to 
do a precise long sleep based on many small sleeps.

even if this means more work for you (i'm sorry about that!) i'm quite 
sure we should take Sebastien's hrtimers based implementation of 
futex_wait(), and use the nanosleep method to restart it. There's no 
point in further tweaking the imprecise approach: whenever some timeout 
needs to be restarted, it's a candidate for hrtimers.

until then, glibc already handles timeouts and restarts it manually.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Andi Kleen
On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote:
> On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote:
> > Ingo Molnar <[EMAIL PROTECTED]> writes:
> > > 
> > > the only correct approach is the use of hrtimers, and a patch exists for 
> > > that - see below. This has been included in -rt for quite some time.
> > 
> > But isn't that bad for power management? You'll likely get more
> > idle wakeups, won't you?
> 
> Why so ? It comes more precise, but only once.

When it's clustered around the jiffies interval then wakeups from
multiple processes will be somewhat batched. With a precise wakeup you'll
get wakeups all over the jiffies period, won't you?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Thomas Gleixner
On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote:
> Ingo Molnar <[EMAIL PROTECTED]> writes:
> > 
> > the only correct approach is the use of hrtimers, and a patch exists for 
> > that - see below. This has been included in -rt for quite some time.
> 
> But isn't that bad for power management? You'll likely get more
> idle wakeups, won't you?

Why so ? It comes more precise, but only once.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Andi Kleen
Ingo Molnar <[EMAIL PROTECTED]> writes:
> 
> the only correct approach is the use of hrtimers, and a patch exists for 
> that - see below. This has been included in -rt for quite some time.

But isn't that bad for power management? You'll likely get more
idle wakeups, won't you?

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin
On Mon, Mar 12, 2007 at 10:10:06AM +0100, Ingo Molnar wrote:
> 
> * Roland McGrath <[EMAIL PROTECTED]> wrote:
> 
> > I agree it should restart.  But I don't think this is quite right in 
> > the timeout case.  It will increase the total maximum real time spent 
> > arbitrarily by the amount of time elapsed in signal handlers.  Other 
> > restartable, timed calls have to convert to an absolute timeout for 
> > the restart block (and convert back when doing the restart).
> 
> i dont think we should try to do this. We should not and cannot do 
> anything about all of the artifacts that comes with the use of relative 
> timeouts and schedule_timeout().
> 
> basically, using jiffies here (which schedule_timeout() does) is 
> /fundamentally/ imprecise. If you get many interrupts, rounding errors 
> sum up - and there's nothing we can do about it!

Well I did convert futex_wait to an absolute timeout based version in
the subsequent incremental patch. I think that is OK?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin
On Mon, Mar 12, 2007 at 10:10:06AM +0100, Ingo Molnar wrote:
 
 * Roland McGrath [EMAIL PROTECTED] wrote:
 
  I agree it should restart.  But I don't think this is quite right in 
  the timeout case.  It will increase the total maximum real time spent 
  arbitrarily by the amount of time elapsed in signal handlers.  Other 
  restartable, timed calls have to convert to an absolute timeout for 
  the restart block (and convert back when doing the restart).
 
 i dont think we should try to do this. We should not and cannot do 
 anything about all of the artifacts that comes with the use of relative 
 timeouts and schedule_timeout().
 
 basically, using jiffies here (which schedule_timeout() does) is 
 /fundamentally/ imprecise. If you get many interrupts, rounding errors 
 sum up - and there's nothing we can do about it!

Well I did convert futex_wait to an absolute timeout based version in
the subsequent incremental patch. I think that is OK?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Andi Kleen
Ingo Molnar [EMAIL PROTECTED] writes:
 
 the only correct approach is the use of hrtimers, and a patch exists for 
 that - see below. This has been included in -rt for quite some time.

But isn't that bad for power management? You'll likely get more
idle wakeups, won't you?

-Andi

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Thomas Gleixner
On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote:
 Ingo Molnar [EMAIL PROTECTED] writes:
  
  the only correct approach is the use of hrtimers, and a patch exists for 
  that - see below. This has been included in -rt for quite some time.
 
 But isn't that bad for power management? You'll likely get more
 idle wakeups, won't you?

Why so ? It comes more precise, but only once.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Andi Kleen
On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote:
 On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote:
  Ingo Molnar [EMAIL PROTECTED] writes:
   
   the only correct approach is the use of hrtimers, and a patch exists for 
   that - see below. This has been included in -rt for quite some time.
  
  But isn't that bad for power management? You'll likely get more
  idle wakeups, won't you?
 
 Why so ? It comes more precise, but only once.

When it's clustered around the jiffies interval then wakeups from
multiple processes will be somewhat batched. With a precise wakeup you'll
get wakeups all over the jiffies period, won't you?

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

  i dont think we should try to do this. We should not and cannot do 
  anything about all of the artifacts that comes with the use of 
  relative timeouts and schedule_timeout().
  
  basically, using jiffies here (which schedule_timeout() does) is 
  /fundamentally/ imprecise. If you get many interrupts, rounding 
  errors sum up - and there's nothing we can do about it!
 
 Well I did convert futex_wait to an absolute timeout based version in 
 the subsequent incremental patch. I think that is OK?

it still has the rounding artifacts: using timer_list there is no way to 
do a precise long sleep based on many small sleeps.

even if this means more work for you (i'm sorry about that!) i'm quite 
sure we should take Sebastien's hrtimers based implementation of 
futex_wait(), and use the nanosleep method to restart it. There's no 
point in further tweaking the imprecise approach: whenever some timeout 
needs to be restarted, it's a candidate for hrtimers.

until then, glibc already handles timeouts and restarts it manually.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

 On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote:
  On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote:
   Ingo Molnar [EMAIL PROTECTED] writes:

the only correct approach is the use of hrtimers, and a patch exists 
for 
that - see below. This has been included in -rt for quite some time.
   
   But isn't that bad for power management? You'll likely get more
   idle wakeups, won't you?
  
  Why so ? It comes more precise, but only once.
 
 When it's clustered around the jiffies interval then wakeups from 
 multiple processes will be somewhat batched. With a precise wakeup 
 you'll get wakeups all over the jiffies period, won't you?

if HIGH_RES_TIMERS is disabled then that is what happens. But frankly, 
most futex waits are without timeouts - if an application cares about 
micro-effects like that then you are much better off not using a 
per-futex timeout anyway.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Thomas Gleixner
On Mon, 2007-03-12 at 12:02 +0100, Ingo Molnar wrote:
  Well I did convert futex_wait to an absolute timeout based version in 
  the subsequent incremental patch. I think that is OK?
 
 it still has the rounding artifacts: using timer_list there is no way to 
 do a precise long sleep based on many small sleeps.
 
 even if this means more work for you (i'm sorry about that!) i'm quite 
 sure we should take Sebastien's hrtimers based implementation of 
 futex_wait(), and use the nanosleep method to restart it. There's no 
 point in further tweaking the imprecise approach: whenever some timeout 
 needs to be restarted, it's a candidate for hrtimers.
 
 until then, glibc already handles timeouts and restarts it manually.

This also allows us to add a seperate absolute time bases futex op,
which allows to remove the conversion of abstime to reltime in glibc.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin
On Mon, Mar 12, 2007 at 12:02:04PM +0100, Ingo Molnar wrote:
 
 * Nick Piggin [EMAIL PROTECTED] wrote:
 
   i dont think we should try to do this. We should not and cannot do 
   anything about all of the artifacts that comes with the use of 
   relative timeouts and schedule_timeout().
   
   basically, using jiffies here (which schedule_timeout() does) is 
   /fundamentally/ imprecise. If you get many interrupts, rounding 
   errors sum up - and there's nothing we can do about it!
  
  Well I did convert futex_wait to an absolute timeout based version in 
  the subsequent incremental patch. I think that is OK?
 
 it still has the rounding artifacts: using timer_list there is no way to 
 do a precise long sleep based on many small sleeps.

OK but that is nothing to do with my patch, but the original futex_wait
implementation.

 even if this means more work for you (i'm sorry about that!) i'm quite 
 sure we should take Sebastien's hrtimers based implementation of 
 futex_wait(), and use the nanosleep method to restart it. There's no 
 point in further tweaking the imprecise approach: whenever some timeout 
 needs to be restarted, it's a candidate for hrtimers.

Absolute timeout is needed, sure. But once that is done, hrtimers does
not fix a bug, does it?

 
 until then, glibc already handles timeouts and restarts it manually.

It isn't timeout handling that is buggy, but EINTR behaviour. And
glibc does not handle that here.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

  even if this means more work for you (i'm sorry about that!) i'm 
  quite sure we should take Sebastien's hrtimers based implementation 
  of futex_wait(), and use the nanosleep method to restart it. There's 
  no point in further tweaking the imprecise approach: whenever some 
  timeout needs to be restarted, it's a candidate for hrtimers.
 
 Absolute timeout is needed, sure. But once that is done, hrtimers does 
 not fix a bug, does it?

the issue is this: your fix reduces the effects of the bug but it is 
still fundamentally incomplete because of the use of timer_list. So 
instead of trying to fix the bug the wrong way, please try to fix it the 
right way, ontop of an already existing and tested patch, ok? That also 
enables the other neat stuff Thomas talked about.

  until then, glibc already handles timeouts and restarts it manually.
 
 It isn't timeout handling that is buggy, but EINTR behaviour. And 
 glibc does not handle that here.

hm. I'm wondering how this wasnt noticed sooner - this futex_wait 
behavior has been there for like forever.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Andi Kleen
 if HIGH_RES_TIMERS is disabled then that is what happens. But frankly, 

disabled? I would expect it (= more wakeups) when hrtimers are enabled.

 most futex waits are without timeouts - if an application cares about 
 micro-effects like that then you are much better off not using a 
 per-futex timeout anywa

That sounds like you're arguing for not using hrtimers here because
the applications shouldn't depend on precise timeouts here anyways?!? 

Anyways when you convert more kernel timeouts to hrtimers you should
probably add some kind of batching facility that can be globally
configured with a sysctl or similar. Then at least laptops (and possibly
servers) can opt for more power saving again. For the futexes alone
it probably won't matter too much agreed, but I see a trend to more hr.

I also liked the idea (stolen from another popular OS) that a application
can tell the OS how precise it wants its wakeups to be.

-Andi

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

  if HIGH_RES_TIMERS is disabled then that is what happens. But 
  frankly,
 
 disabled? I would expect it (= more wakeups) when hrtimers are 
 enabled.

i mean the groupping of timer expiries happens automatically when 
high-res is disabled. When high-res is asked for then, duh, it's enabled 
and you get precise timeouts ;-)

  most futex waits are without timeouts - if an application cares 
  about micro-effects like that then you are much better off not using 
  a per-futex timeout anywa
 
 That sounds like you're arguing for not using hrtimers here because 
 the applications shouldn't depend on precise timeouts here anyways?!?

I was talking about the micro-effect of grouping timer expiries.

 Anyways when you convert more kernel timeouts to hrtimers you should 
 probably add some kind of batching facility that can be globally 
 configured with a sysctl or similar. Then at least laptops (and 
 possibly servers) can opt for more power saving again. For the futexes 
 alone it probably won't matter too much agreed, but I see a trend to 
 more hr.

yeah, we had that in earlier versions, it's trivial - nobody used it. So 
i'll wait for the actual measurements and a patch (or i can do the patch 
too, if someone comes up with the measurements). I've added 
/proc/timer_stats and /proc/timer_info for exactly such reasons.

( note that we've added the facility for even more imprecise sleeps to 
  the timer_list APIs - but for hrtimers it's a lot less clear-cut case. )

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin
On Mon, Mar 12, 2007 at 12:19:58PM +0100, Ingo Molnar wrote:
 
 * Nick Piggin [EMAIL PROTECTED] wrote:
 
   even if this means more work for you (i'm sorry about that!) i'm 
   quite sure we should take Sebastien's hrtimers based implementation 
   of futex_wait(), and use the nanosleep method to restart it. There's 
   no point in further tweaking the imprecise approach: whenever some 
   timeout needs to be restarted, it's a candidate for hrtimers.
  
  Absolute timeout is needed, sure. But once that is done, hrtimers does 
  not fix a bug, does it?
 
 the issue is this: your fix reduces the effects of the bug but it is 
 still fundamentally incomplete because of the use of timer_list. So 

But using schedule_timeout is not a bug. Userspace timeouts are always
defined to be at least.

 instead of trying to fix the bug the wrong way, please try to fix it the 
 right way, ontop of an already existing and tested patch, ok? That also 
 enables the other neat stuff Thomas talked about.

Well that's nice, but I have a bugfix here which probably needs to
get backported to stable kernels and distro kernels.

It should be just as easy to rebase the hrtimer patch on top of my
fix. Considering that you've had it for a year, I don't think it 
needs to be added right before my fix.

   until then, glibc already handles timeouts and restarts it manually.
  
  It isn't timeout handling that is buggy, but EINTR behaviour. And 
  glibc does not handle that here.
 
 hm. I'm wondering how this wasnt noticed sooner - this futex_wait 
 behavior has been there for like forever.

People ignore LTP test failures, and programs probably try to avoid
exercising the nuances of the unix signal API, I guess.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

  the issue is this: your fix reduces the effects of the bug but it is 
  still fundamentally incomplete because of the use of timer_list. So
 
 But using schedule_timeout is not a bug. Userspace timeouts are always 
 defined to be at least.

but what you are adding isnt a plain schedule_timeout(), it is a restart 
block handling loop. And for those restart blocks that relate to 
timeouts, we only use hrtimers. I am not making this up to annoy you: 
take a look at all the current restart block handlers - they are hrtimer 
based, for exactly this reason.

  instead of trying to fix the bug the wrong way, please try to fix it 
  the right way, ontop of an already existing and tested patch, ok? 
  That also enables the other neat stuff Thomas talked about.
 
 Well that's nice, but I have a bugfix here which probably needs to get 
 backported to stable kernels and distro kernels.

yes but your patch already exists for them which they can pick up.

really, this is a common Linux principle: fix it completely and fix it 
the right way. You are applying it yourself on a daily basis when having 
the maintainer hat on =B-)

 It should be just as easy to rebase the hrtimer patch on top of my 
 fix. Considering that you've had it for a year, I don't think it needs 
 to be added right before my fix.

your latest patch looks quite kludgy, exactly due to the issues that 
were mentioned.

  hm. I'm wondering how this wasnt noticed sooner - this futex_wait 
  behavior has been there for like forever.
 
 People ignore LTP test failures, and programs probably try to avoid 
 exercising the nuances of the unix signal API, I guess.

then there's no rush and lets do this the right way, ok?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin
On Mon, Mar 12, 2007 at 12:38:29PM +0100, Ingo Molnar wrote:
 
 * Nick Piggin [EMAIL PROTECTED] wrote:
 
   the issue is this: your fix reduces the effects of the bug but it is 
   still fundamentally incomplete because of the use of timer_list. So
  
  But using schedule_timeout is not a bug. Userspace timeouts are always 
  defined to be at least.
 
 but what you are adding isnt a plain schedule_timeout(), it is a restart 
 block handling loop. And for those restart blocks that relate to 
 timeouts, we only use hrtimers. I am not making this up to annoy you: 
 take a look at all the current restart block handlers - they are hrtimer 
 based, for exactly this reason.

So why do you say it is fundamentally incomplete?

   instead of trying to fix the bug the wrong way, please try to fix it 
   the right way, ontop of an already existing and tested patch, ok? 
   That also enables the other neat stuff Thomas talked about.
  
  Well that's nice, but I have a bugfix here which probably needs to get 
  backported to stable kernels and distro kernels.
 
 yes but your patch already exists for them which they can pick up.
 
 really, this is a common Linux principle: fix it completely and fix it 
 the right way. You are applying it yourself on a daily basis when having 
 the maintainer hat on =B-)

I still didn't get anything wrong pointed out with the patch, though.

I'm not arguing against using hrtimers here to fix it the right way.

  It should be just as easy to rebase the hrtimer patch on top of my 
  fix. Considering that you've had it for a year, I don't think it needs 
  to be added right before my fix.
 
 your latest patch looks quite kludgy, exactly due to the issues that 
 were mentioned.

I don't see what is kludgy, unless you consider converting to and from
absolute timeouts. But the userspace API is relative time based, so
hrtimers doesn't change that.

   hm. I'm wondering how this wasnt noticed sooner - this futex_wait 
   behavior has been there for like forever.
  
  People ignore LTP test failures, and programs probably try to avoid 
  exercising the nuances of the unix signal API, I guess.
 
 then there's no rush and lets do this the right way, ok?

There is no rush to use hrtimers. I would have thought it fairly important
to actually reach correctness, though. We're not talking about completely
changing the design of something such that it will take a lot of work to
fix it properly. If that were the issue, then I would consider the
hrtimer conversion as part of the fix.

And if you talk about doing it the right way, then I don't think it is
strictly the right way to reimplement the function, including known bugs,
to be slightly more efficient, and *then* fixing those bugs. I'd actually
consider it better to fix the bugs first, not only because of the backport
issue, but because it generally makes it easier to track the injection and
removal points of bugs in the history.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

the issue is this: your fix reduces the effects of the bug but 
it is still fundamentally incomplete because of the use of 
timer_list. So
   
   But using schedule_timeout is not a bug. Userspace timeouts are 
   always defined to be at least.
  
  but what you are adding isnt a plain schedule_timeout(), it is a 
  restart block handling loop. And for those restart blocks that 
  relate to timeouts, we only use hrtimers. I am not making this up to 
  annoy you: take a look at all the current restart block handlers - 
  they are hrtimer based, for exactly this reason.
 
 So why do you say it is fundamentally incomplete?

because i misread your last patch :-) I thought it still has a window 
for inaccuracy, but you are right: it should be at most 1 jiffy 
inaccurate, no matter how many times we restart.

still ... the hrtimers patch has been submitted to lkml before yours, 
and has been tested extensively, so why go the extra side-jump 
prolonging the jiffies sleep method? The LTP failure has been there 
since the inception of the futex code i suspect. Going this way also 
enables the addressing of a more pressing need: the elimination of 
glibc's forced use of relative futex timeouts.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin
On Mon, Mar 12, 2007 at 01:21:03PM +0100, Ingo Molnar wrote:
 
 * Nick Piggin [EMAIL PROTECTED] wrote:
 
 the issue is this: your fix reduces the effects of the bug but 
 it is still fundamentally incomplete because of the use of 
 timer_list. So

But using schedule_timeout is not a bug. Userspace timeouts are 
always defined to be at least.
   
   but what you are adding isnt a plain schedule_timeout(), it is a 
   restart block handling loop. And for those restart blocks that 
   relate to timeouts, we only use hrtimers. I am not making this up to 
   annoy you: take a look at all the current restart block handlers - 
   they are hrtimer based, for exactly this reason.
  
  So why do you say it is fundamentally incomplete?
 
 because i misread your last patch :-) I thought it still has a window 
 for inaccuracy, but you are right: it should be at most 1 jiffy 
 inaccurate, no matter how many times we restart.

OK, no problem.

 still ... the hrtimers patch has been submitted to lkml before yours, 
 and has been tested extensively, so why go the extra side-jump 
 prolonging the jiffies sleep method? The LTP failure has been there 
 since the inception of the futex code i suspect. Going this way also 
 enables the addressing of a more pressing need: the elimination of 
 glibc's forced use of relative futex timeouts.

I guess my arguments are that my patch fixes a bug, which gives it a
higher priority (being a userspace API bug, perhaps even 2.6.21); and
that it will want to be backported while the hrtimer patch will not, so
including the hrtimer patch first means 2 different patches to fix the
same bug.

I'm not trying to make life harder for the hrtimer patch. I will even
volunteer to forward port it on top of the restart fix, if that is an
issue.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Theodore Tso
On Mon, Mar 12, 2007 at 11:58:26AM +0100, Andi Kleen wrote:
 On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote:
  On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote:
   Ingo Molnar [EMAIL PROTECTED] writes:

the only correct approach is the use of hrtimers, and a patch exists 
for 
that - see below. This has been included in -rt for quite some time.
   
   But isn't that bad for power management? You'll likely get more
   idle wakeups, won't you?
  
  Why so ? It comes more precise, but only once.
 
 When it's clustered around the jiffies interval then wakeups from
 multiple processes will be somewhat batched. With a precise wakeup you'll
 get wakeups all over the jiffies period, won't you?

What we probably need in the long-term, and not just for high
precision wakeups, is we need a way for waiters (either in the kernel
or in userspace) to specify a desired precision in their timers.  Is
it, wake me up in a second, exactly, or wake me up in a second,
plus or minus 10ms?   (or 50ms?  or 100ms?).

This becomes especially important if we want the tickless code to
really shine as far as power management is concerned.  Unfortunately,
the POSIX timer abstraction doesn't give this kind of flexibility
easily, so it's going to be a while before we see significant
userspace adoption of such a kernel feature, but I think it's
something that would be still worthwhile to add.

Regards,

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Ingo Molnar

* Theodore Tso [EMAIL PROTECTED] wrote:

 What we probably need in the long-term, and not just for high 
 precision wakeups, is we need a way for waiters (either in the kernel 
 or in userspace) to specify a desired precision in their timers.  Is 
 it, wake me up in a second, exactly, or wake me up in a second, 
 plus or minus 10ms?  (or 50ms?  or 100ms?).

such a facility exists already, see round_jiffies() and 
round_jiffies_relative(). There's some short blurb about it at:

  
http://kernelnewbies.org/LinuxChanges#head-513ceda14f5d8cf5b8a7c81d7e3821543141ecb0

 This becomes especially important if we want the tickless code to 
 really shine as far as power management is concerned. [...]

yes. That's why we also implemented /proc/timer_stat, and this was 
measured and a few higher-frequency fuzzy waiters were converted to use 
round_jiffies(). Some other waiters were fixed in user-space. It's all 
dependent on actual measurements and circumstances.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Nick Piggin
On Mon, Mar 12, 2007 at 10:12:14AM -0400, Theodore Tso wrote:
 On Mon, Mar 12, 2007 at 11:58:26AM +0100, Andi Kleen wrote:
  On Mon, Mar 12, 2007 at 12:00:20PM +0100, Thomas Gleixner wrote:
   On Mon, 2007-03-12 at 12:27 +0100, Andi Kleen wrote:
Ingo Molnar [EMAIL PROTECTED] writes:
 
 the only correct approach is the use of hrtimers, and a patch exists 
 for 
 that - see below. This has been included in -rt for quite some time.

But isn't that bad for power management? You'll likely get more
idle wakeups, won't you?
   
   Why so ? It comes more precise, but only once.
  
  When it's clustered around the jiffies interval then wakeups from
  multiple processes will be somewhat batched. With a precise wakeup you'll
  get wakeups all over the jiffies period, won't you?
 
 What we probably need in the long-term, and not just for high
 precision wakeups, is we need a way for waiters (either in the kernel
 or in userspace) to specify a desired precision in their timers.  Is
 it, wake me up in a second, exactly, or wake me up in a second,
 plus or minus 10ms?   (or 50ms?  or 100ms?).

Would this work, or will it just create more confusion for the API user?
I mean, all sleeps can only guarantee no less than.

Would it be enough for a binary (exact as possible / relaxed if needed)
flag? Or perhaps ternary (exact/relaxed/batched), where relaxed could
add an extra jiffy or so, and batched is really relaxed that may delay
up to double the value of the timeout.

 This becomes especially important if we want the tickless code to
 really shine as far as power management is concerned.  Unfortunately,
 the POSIX timer abstraction doesn't give this kind of flexibility
 easily, so it's going to be a while before we see significant
 userspace adoption of such a kernel feature, but I think it's
 something that would be still worthwhile to add.

But given that we know most userspace API timeouts are broadly just an
equal to or greater, then we could add another timeout flag to specify
it is a userspace timeout, and make that controllable by sysctl.

Sure it isn't ideal, but for those who really want power / hypervisor
savings, it could be useful.

BTW. my futex man page says timeout's contents describe the maximum duration
of the wait. Surely that should be *minimum*? Michael cc'ed.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] change futex_wait() to hrtimers

2007-03-12 Thread Andi Kleen
 This becomes especially important if we want the tickless code to
 really shine as far as power management is concerned.  Unfortunately,
 the POSIX timer abstraction doesn't give this kind of flexibility
 easily, so it's going to be a while before we see significant
 userspace adoption of such a kernel feature, but I think it's
 something that would be still worthwhile to add.

I suspect it would be overkill to specify that on every sleep operation.
99% of applications won't care and for the 1% leftover a global per 
process setting should be fine.
I think a single prctl() and a global sysctl as default would be enough.

-Andi

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/