Re: [PATCH] msleep() with hrtimers

2007-08-06 Thread Roman Zippel
Hi,

On Sun, 5 Aug 2007, Arjan van de Ven wrote:

  There's no problem to provide a high resolution sleep, but there is also 
  no reason to mess with msleep, don't fix what ain't broken...
 
 John Corbet provided the patch because he had a problem with the current
 msleep... in that it didn't provide as good a common case as he
 wanted... so I think your statement is wrong ;)

Only under the assumptation, that msleep _must_ be fixed for all other 
current users too.
Give users a choice to use msleep or nanosleep, how do you know what's 
best for them?

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] msleep() with hrtimers

2007-08-05 Thread Roman Zippel
Hi,

On Sun, 5 Aug 2007, Arjan van de Ven wrote:

> Timers are course resolution that is highly HZ-value dependent. For
> cases where you want a finer resolution, the kernel now has a way to
> provide that functionality... so why not use the quality of service this
> provides..

We're going in circles here. We have two different timer APIs for a 
reason, only because hrtimer provide better resolution, doesn't 
automatically make them the better generic timer.
There's no problem to provide a high resolution sleep, but there is also 
no reason to mess with msleep, don't fix what ain't broken...

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] msleep() with hrtimers

2007-08-05 Thread Roman Zippel
Hi,

On Sat, 4 Aug 2007, Arjan van de Ven wrote:

> > hr_msleep makes no sense. Why should we tie this interface to millisecond 
> > resolution?
> 
> because a lot of parts of the kernel think and work in milliseconds,
> it's logical and USEFUL to at least provide an interface that works on
> milliseconds.

If the millisecond resolution is enough for these users, that means the 
current msleep will work fine for them.

> > Your suggested msleep_approx makes not much sense to me either, since 
> > neither interface guarantees anything and may "approximate" the sleep 
> > (and if the user is surprised by that something else already went wrong).
> 
> an interface should try to map to the implementation that provides the
> best implementation quality of the requested thing in general. That's
> the hrtimers based msleep().

This generalization is simply not true. First it requires the 
HIGH_RES_TIMERS option to be enabled to really make a real difference. 
Second a hrtimers based msleep has a higher setup cost, which can't be 
completely ignored. "Best" is a subjective term here and can't be that 
easily generalized to all current users.

> > If you don't like the hrsleep name, we can also call it nanosleep and so 
> > match what we already do for userspace.
> 
> having a nanosleep *in addition* to msleep (or maybe nsleep() and
> usleep() to have consistent naming) sounds reasonable to me.

We only need one sleep implementation of both and msleep is a fine name 
for the current implementation - not only does it describe the unit, but 
it also describe the best resolution one can expect from it.

> Do you have something against hrtimer use in general? From your emails
> on this msleep topic it sort of seems you do 

I can give the question back, what do you have against simple timers, that 
you want to make them as awkward as possible to use?
hrtimer have a higher usage cost depending on the clock source, so simply 
using them only because they are the new cool kid in town doesn't make 
sense. It may not be that critical for a simple sleep implementation, but 
that only means we should keep the API as simple as possible, that means 
one low resolution, cheap msleep and one high resolution nanosleep is 
enough. Why do you insist on making more complex than necessary?

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] msleep() with hrtimers

2007-08-05 Thread Roman Zippel
Hi,

On Sat, 4 Aug 2007, Arjan van de Ven wrote:

  hr_msleep makes no sense. Why should we tie this interface to millisecond 
  resolution?
 
 because a lot of parts of the kernel think and work in milliseconds,
 it's logical and USEFUL to at least provide an interface that works on
 milliseconds.

If the millisecond resolution is enough for these users, that means the 
current msleep will work fine for them.

  Your suggested msleep_approx makes not much sense to me either, since 
  neither interface guarantees anything and may approximate the sleep 
  (and if the user is surprised by that something else already went wrong).
 
 an interface should try to map to the implementation that provides the
 best implementation quality of the requested thing in general. That's
 the hrtimers based msleep().

This generalization is simply not true. First it requires the 
HIGH_RES_TIMERS option to be enabled to really make a real difference. 
Second a hrtimers based msleep has a higher setup cost, which can't be 
completely ignored. Best is a subjective term here and can't be that 
easily generalized to all current users.

  If you don't like the hrsleep name, we can also call it nanosleep and so 
  match what we already do for userspace.
 
 having a nanosleep *in addition* to msleep (or maybe nsleep() and
 usleep() to have consistent naming) sounds reasonable to me.

We only need one sleep implementation of both and msleep is a fine name 
for the current implementation - not only does it describe the unit, but 
it also describe the best resolution one can expect from it.

 Do you have something against hrtimer use in general? From your emails
 on this msleep topic it sort of seems you do 

I can give the question back, what do you have against simple timers, that 
you want to make them as awkward as possible to use?
hrtimer have a higher usage cost depending on the clock source, so simply 
using them only because they are the new cool kid in town doesn't make 
sense. It may not be that critical for a simple sleep implementation, but 
that only means we should keep the API as simple as possible, that means 
one low resolution, cheap msleep and one high resolution nanosleep is 
enough. Why do you insist on making more complex than necessary?

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] msleep() with hrtimers

2007-08-05 Thread Roman Zippel
Hi,

On Sun, 5 Aug 2007, Arjan van de Ven wrote:

 Timers are course resolution that is highly HZ-value dependent. For
 cases where you want a finer resolution, the kernel now has a way to
 provide that functionality... so why not use the quality of service this
 provides..

We're going in circles here. We have two different timer APIs for a 
reason, only because hrtimer provide better resolution, doesn't 
automatically make them the better generic timer.
There's no problem to provide a high resolution sleep, but there is also 
no reason to mess with msleep, don't fix what ain't broken...

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] msleep() with hrtimers

2007-08-03 Thread Roman Zippel
Hi,

On Fri, 3 Aug 2007, Arjan van de Ven wrote:

> > Actually the hrsleep() function would allow for submillisecond sleeps, 
> > which might be what some of the 450 users really want and they only use
> > msleep(1) because it's the next best thing.
> > A hrsleep() function is really what makes most sense from an API 
> > perspective.
> 
> I respectfully disagree. The power of msleep is that the unit of sleep
> time is in the name; so in your proposal it would be hr_msleep or
> somesuch. I much rather do the opposite in that case; make the "short"
> name be the best implementation of the requested behavior, and have
> qualifiers for allowing exceptions to that... least surprise and all
> that.

hr_msleep makes no sense. Why should we tie this interface to millisecond 
resolution?
Your suggested msleep_approx makes not much sense to me either, since 
neither interface guarantees anything and may "approximate" the sleep 
(and if the user is surprised by that something else already went wrong).
If you don't like the hrsleep name, we can also call it nanosleep and so 
match what we already do for userspace.


bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] msleep() with hrtimers

2007-08-03 Thread Roman Zippel
Hi,

On Fri, 3 Aug 2007, Arjan van de Ven wrote:

> On Fri, 2007-08-03 at 21:19 +0200, Roman Zippel wrote:
> > Hi,
> > 
> > On Fri, 3 Aug 2007, Jonathan Corbet wrote:
> > 
> > > Most comments last time were favorable.  The one dissenter was Roman,
> > > who worries about the overhead of using hrtimers for this operation; my
> > > understanding is that he would rather see a really_msleep() function for
> > > those who actually want millisecond resolution.  I'm not sure how to
> > > characterize what the cost could be, but it can only be buried by the
> > > fact that every call sleeps for some number of milliseconds.  On my
> > > system, the several hundred total msleep() calls can't cause any real
> > > overhead, and almost all happen at initialization time.
> > 
> > The main point is still that these are two _different_ APIs for different 
> > usages, so I still prefer to add a hrsleep() instead.
> 
> 
> I would actually prefer it the other way around; call the
> not-so-accurate one "msleep_approx()" or somesuch, to make it explicit
> that the sleep is only approximate...

Actually the hrsleep() function would allow for submillisecond sleeps, 
which might be what some of the 450 users really want and they only use
msleep(1) because it's the next best thing.
A hrsleep() function is really what makes most sense from an API 
perspective.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] msleep() with hrtimers

2007-08-03 Thread Roman Zippel
Hi,

On Fri, 3 Aug 2007, Jonathan Corbet wrote:

> Most comments last time were favorable.  The one dissenter was Roman,
> who worries about the overhead of using hrtimers for this operation; my
> understanding is that he would rather see a really_msleep() function for
> those who actually want millisecond resolution.  I'm not sure how to
> characterize what the cost could be, but it can only be buried by the
> fact that every call sleeps for some number of milliseconds.  On my
> system, the several hundred total msleep() calls can't cause any real
> overhead, and almost all happen at initialization time.

The main point is still that these are two _different_ APIs for different 
usages, so I still prefer to add a hrsleep() instead.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] msleep() with hrtimers

2007-08-03 Thread Roman Zippel
Hi,

On Fri, 3 Aug 2007, Jonathan Corbet wrote:

 Most comments last time were favorable.  The one dissenter was Roman,
 who worries about the overhead of using hrtimers for this operation; my
 understanding is that he would rather see a really_msleep() function for
 those who actually want millisecond resolution.  I'm not sure how to
 characterize what the cost could be, but it can only be buried by the
 fact that every call sleeps for some number of milliseconds.  On my
 system, the several hundred total msleep() calls can't cause any real
 overhead, and almost all happen at initialization time.

The main point is still that these are two _different_ APIs for different 
usages, so I still prefer to add a hrsleep() instead.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] msleep() with hrtimers

2007-08-03 Thread Roman Zippel
Hi,

On Fri, 3 Aug 2007, Arjan van de Ven wrote:

 On Fri, 2007-08-03 at 21:19 +0200, Roman Zippel wrote:
  Hi,
  
  On Fri, 3 Aug 2007, Jonathan Corbet wrote:
  
   Most comments last time were favorable.  The one dissenter was Roman,
   who worries about the overhead of using hrtimers for this operation; my
   understanding is that he would rather see a really_msleep() function for
   those who actually want millisecond resolution.  I'm not sure how to
   characterize what the cost could be, but it can only be buried by the
   fact that every call sleeps for some number of milliseconds.  On my
   system, the several hundred total msleep() calls can't cause any real
   overhead, and almost all happen at initialization time.
  
  The main point is still that these are two _different_ APIs for different 
  usages, so I still prefer to add a hrsleep() instead.
 
 
 I would actually prefer it the other way around; call the
 not-so-accurate one msleep_approx() or somesuch, to make it explicit
 that the sleep is only approximate...

Actually the hrsleep() function would allow for submillisecond sleeps, 
which might be what some of the 450 users really want and they only use
msleep(1) because it's the next best thing.
A hrsleep() function is really what makes most sense from an API 
perspective.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] msleep() with hrtimers

2007-08-03 Thread Roman Zippel
Hi,

On Fri, 3 Aug 2007, Arjan van de Ven wrote:

  Actually the hrsleep() function would allow for submillisecond sleeps, 
  which might be what some of the 450 users really want and they only use
  msleep(1) because it's the next best thing.
  A hrsleep() function is really what makes most sense from an API 
  perspective.
 
 I respectfully disagree. The power of msleep is that the unit of sleep
 time is in the name; so in your proposal it would be hr_msleep or
 somesuch. I much rather do the opposite in that case; make the short
 name be the best implementation of the requested behavior, and have
 qualifiers for allowing exceptions to that... least surprise and all
 that.

hr_msleep makes no sense. Why should we tie this interface to millisecond 
resolution?
Your suggested msleep_approx makes not much sense to me either, since 
neither interface guarantees anything and may approximate the sleep 
(and if the user is surprised by that something else already went wrong).
If you don't like the hrsleep name, we can also call it nanosleep and so 
match what we already do for userspace.


bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-02 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Linus Torvalds wrote:

> So I think it would be entirely appropriate to
> 
>  - do something that *approximates* microseconds.
> 
>Using microseconds instead of nanoseconds would likely allow us to do 
>32-bit arithmetic in more areas, without any real overflow.

The basic problem is that one needs a number of bits (at least 16) for 
normalization, which limits the time range one can work with. This means 
that 32 bit leaves only room for 1 millisecond resolution, the remainder 
could maybe saved and reused later.
So AFAICT using micro- or nanosecond resolution doesn't make much 
computational difference.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-02 Thread Roman Zippel
Hi,

On Thu, 2 Aug 2007, Ingo Molnar wrote:

> Most importantly, CFS _already_ includes a number of measures that act 
> against too frequent math. So even though you can see 64-bit math code 
> in it, it's only rarely called if your clock has a low resolution - and 
> that happens all automatically! (see below the details of this buffered 
> delta math)
> 
> I have not seen Roman notice and mention any of these important details 
> (perhaps because he was concentrating on finding faults in CFS - which a 
> reviewer should do), but those measures are still very important for a 
> complete, balanced picture, especially if one focuses on overhead on 
> small boxes where the clock is low-resolution.
> 
> As Peter has said it in his detailed review of Roman's suggested 
> algorithm, our main focus is on keeping total complexity down - and we 
> are (of course) fundamentally open to changing the math behind CFS, we 
> ourselves tweaked it numerous times, it's not cast into stone in any 
> way, shape or form.

You're comparing apples with oranges, I explicitely said:

"At this point I'm not that much interested in a few localized 
optimizations, what I'm interested in is how can this optimized at the 
design level"

IMO it's very important to keep computational and algorithmic complexity 
separately, I want to concentrate on the latter, so unless you can _prove_ 
that a similiar set of optimizations is impossible within my example, I'm 
going to ignore them for now. CFS has already gone through several 
versions of optimization and tuning, expecting the same from my design 
prototype is a little confusing...

I want to analyze the foundation CFS is based on, in the review I 
mentioned a number of other issues and design related questions. If you 
need more time, that's fine, but I'd appreciate more background 
information related to that and not that you only jump on the more trivial 
issues.

> In Roman's variant of CFS's algorithm the variables are 32-bit, but the 
> error is rolled forward in separate fract_* (fractional) 32-bit 
> variables, so we still have 32+32==64 bit of stuff to handle. So we 
> think that in the end such a 32+32 scheme would be more complex (and 
> anyone who took a look at fs2.c would i think agree - it took Peter a 
> day to decypher the math!)

Come on, Ingo, you can do better than that, I did mention in my review 
some of the requirements for the data types.
I'm amazed how you can get to that judgement so quickly, could you please 
substantiate that a little more?

I admit that the lack of source comments is an open invitation for further 
questions and Peter did exactly this and his comments were great - I'm 
hoping for more like that. You OTOH jump to conclusions based on a partial 
understanding what I'm actually trying to do.
Ingo, how about you provide some of the mathematical prove CFS is based 
on? Can you prove that the rounding errors are irrelevant? Can you prove 
that all the limit checks can have no adverse effect? I tried that and I'm 
not entirely convinced of that, but maybe it's just me, so I'd love to see 
someone else's attempt at this.
A major goal of my design is it to be able to define the limits within the 
scheduler is working correctly, so I know which information is relevant 
and what can be approximated.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-02 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Peter Zijlstra wrote:

> Took me most of today trying to figure out WTH you did in fs2.c, more
> math and fundamental explanations would have been good. So please bear
> with me as I try to recap this thing. (No, your code was very much _not_
> obvious, a few comments and broken out functions would have made a world
> of a difference)

Thanks for the effort though. :)
I know I'm not the best explaining these things, so I really appreciate 
the questions, so I know what to concentrate on.

> So, for each task we keep normalised time
> 
>  normalised time := time/weight
> 
> using Bresenham's algorithm we can do this prefectly (up until a renice
> - where you'd get errors)
> 
> avg_frac += weight_inv
> 
> weight_inv = X / weight
> 
> avg = avg_frac / weight0_inv
> 
> weight0_inv = X / weight0
> 
> avg = avg_frac / (X / weight0) 
> = (X / weight) / (X / weight0) 
> = X / weight * weight0 / X 
> = weight0 / weight
> 
> 
> So avg ends up being in units of [weight0/weight].
> 
> Then, in order to allow sleeping, we need to have a global clock to sync
> with. Its this global clock that gave me headaches to reconstruct.
> 
> We're looking for a time like this:
> 
>   rq_time := sum(time)/sum(weight)
> 
> And you commented that the /sum(weight) part is where CFS obtained its
> accumulating rounding error? (I'm inclined to believe the error will
> statistically be 0, but I'll readily accept otherwise if you can show a
> practical 'exploit')
> 
> Its not obvious how to do this using modulo logic like Bresenham because
> that would involve using a gcm of all possible weights.

I think I've sent you off into the wrong direction somehow. Sorry. :)

Let's ignore the average for a second, normalized time is maintained as:

normalized time := time * (2^16 / weight)

The important point is that I keep the value in full resolution of 2^-16 
vsec units (vsec for virtual second or sec/weight, where every tasks gets 
weight seconds for every virtual second, to keep things simpler I also 
omit the nano prefix from the units for a moment). Compared to that CFS 
maintains a global normalized value in 1 vsec units.
Since I don't round the value down I avoid the accumulating error, this 
means that 

time_norm += time_delta1 * (2^16 / weight)
time_norm += time_delta2 * (2^16 / weight)

is the same as

time_norm += (time_delta1 + time_delta2) * (2^16 / weight)

CFS for example does this

delta_mine = calc_delta_mine(delta_exec, curr->load.weight, lw);

in above terms this means

time = time_delta * weight * (2^16 / weight_sum) / 2^16

The last shift now rounds the value down and if one does that 1000 times 
per second, the resolution of the value that is finally accounted to 
wait_runtime is also reduced appropriately.

The other rounding problem is based on that this term

x * prio_to_weight[i] * prio_to_wmult[i] / 2^32

doesn't produce x for most values in that tables (the same applies to the 
weight sum), so if we have chains, where the values are converted from one 
scale to the other, a rounding error is produced. In CFS this happens now 
because wait_runtime is maintained in nanoseconds and fair_clock is a 
normalized value.

The problem here isn't that these errors might have a statistical 
relevance, as they are usually completely overshadowed by measurement 
errors anyway. The problem is that these errors exist at all, this means 
they have to be compensated somehow, so that they don't accumulate over 
time and then become significant. This also has to be seen in the context 
of the overflow checks. All this adds a number of variables to the system 
which considerably increases complexity and makes a thorough analysis 
quite challenging.

So to get back to the average, if you look for this

rq_time := sum(time)/sum(weight)

you won't find it like this, this basically produces a weighted average 
and I agree this can't really be maintained via the modulo logic (at least 
AFAICT), so I'm using a simple average instead, so if we have:

time_norm = time/weight

we can write your rq_time like this:

weighted_avg = 
sum_{i}^{N}(time_norm_{i}*weight_{i})/sum_{i}^{N}(weight_{i})

this is the formula for a weighted average, so we can aproximate the value 
using a simple average instead:

avg = sum_{i}^{N}(time_norm_{i})/N

This sum is now what I maintain at runtime incrementally:

time_{i} = sum_{j}^{S}(time_{j})

time_norm_{i} = time_{i}/weight_{i}
  = sum_{j}^{S}(time_{j})/weight_{i}
  = sum_{j}^{S}(time_{j}/weight_{i})

If I add this up and add weigth0 I get:

avg*N*weigth0 = sum_{i}^{N}(time_norm_{i})*weight0

and now I have also the needed modulo factors.

The average probably could be further simplified by using a different 
approximation. 

Re: CFS review

2007-08-02 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Peter Zijlstra wrote:

 Took me most of today trying to figure out WTH you did in fs2.c, more
 math and fundamental explanations would have been good. So please bear
 with me as I try to recap this thing. (No, your code was very much _not_
 obvious, a few comments and broken out functions would have made a world
 of a difference)

Thanks for the effort though. :)
I know I'm not the best explaining these things, so I really appreciate 
the questions, so I know what to concentrate on.

 So, for each task we keep normalised time
 
  normalised time := time/weight
 
 using Bresenham's algorithm we can do this prefectly (up until a renice
 - where you'd get errors)
 
 avg_frac += weight_inv
 
 weight_inv = X / weight
 
 avg = avg_frac / weight0_inv
 
 weight0_inv = X / weight0
 
 avg = avg_frac / (X / weight0) 
 = (X / weight) / (X / weight0) 
 = X / weight * weight0 / X 
 = weight0 / weight
 
 
 So avg ends up being in units of [weight0/weight].
 
 Then, in order to allow sleeping, we need to have a global clock to sync
 with. Its this global clock that gave me headaches to reconstruct.
 
 We're looking for a time like this:
 
   rq_time := sum(time)/sum(weight)
 
 And you commented that the /sum(weight) part is where CFS obtained its
 accumulating rounding error? (I'm inclined to believe the error will
 statistically be 0, but I'll readily accept otherwise if you can show a
 practical 'exploit')
 
 Its not obvious how to do this using modulo logic like Bresenham because
 that would involve using a gcm of all possible weights.

I think I've sent you off into the wrong direction somehow. Sorry. :)

Let's ignore the average for a second, normalized time is maintained as:

normalized time := time * (2^16 / weight)

The important point is that I keep the value in full resolution of 2^-16 
vsec units (vsec for virtual second or sec/weight, where every tasks gets 
weight seconds for every virtual second, to keep things simpler I also 
omit the nano prefix from the units for a moment). Compared to that CFS 
maintains a global normalized value in 1 vsec units.
Since I don't round the value down I avoid the accumulating error, this 
means that 

time_norm += time_delta1 * (2^16 / weight)
time_norm += time_delta2 * (2^16 / weight)

is the same as

time_norm += (time_delta1 + time_delta2) * (2^16 / weight)

CFS for example does this

delta_mine = calc_delta_mine(delta_exec, curr-load.weight, lw);

in above terms this means

time = time_delta * weight * (2^16 / weight_sum) / 2^16

The last shift now rounds the value down and if one does that 1000 times 
per second, the resolution of the value that is finally accounted to 
wait_runtime is also reduced appropriately.

The other rounding problem is based on that this term

x * prio_to_weight[i] * prio_to_wmult[i] / 2^32

doesn't produce x for most values in that tables (the same applies to the 
weight sum), so if we have chains, where the values are converted from one 
scale to the other, a rounding error is produced. In CFS this happens now 
because wait_runtime is maintained in nanoseconds and fair_clock is a 
normalized value.

The problem here isn't that these errors might have a statistical 
relevance, as they are usually completely overshadowed by measurement 
errors anyway. The problem is that these errors exist at all, this means 
they have to be compensated somehow, so that they don't accumulate over 
time and then become significant. This also has to be seen in the context 
of the overflow checks. All this adds a number of variables to the system 
which considerably increases complexity and makes a thorough analysis 
quite challenging.

So to get back to the average, if you look for this

rq_time := sum(time)/sum(weight)

you won't find it like this, this basically produces a weighted average 
and I agree this can't really be maintained via the modulo logic (at least 
AFAICT), so I'm using a simple average instead, so if we have:

time_norm = time/weight

we can write your rq_time like this:

weighted_avg = 
sum_{i}^{N}(time_norm_{i}*weight_{i})/sum_{i}^{N}(weight_{i})

this is the formula for a weighted average, so we can aproximate the value 
using a simple average instead:

avg = sum_{i}^{N}(time_norm_{i})/N

This sum is now what I maintain at runtime incrementally:

time_{i} = sum_{j}^{S}(time_{j})

time_norm_{i} = time_{i}/weight_{i}
  = sum_{j}^{S}(time_{j})/weight_{i}
  = sum_{j}^{S}(time_{j}/weight_{i})

If I add this up and add weigth0 I get:

avg*N*weigth0 = sum_{i}^{N}(time_norm_{i})*weight0

and now I have also the needed modulo factors.

The average probably could be further simplified by using a different 
approximation. The question is how perfect this average 

Re: CFS review

2007-08-02 Thread Roman Zippel
Hi,

On Thu, 2 Aug 2007, Ingo Molnar wrote:

 Most importantly, CFS _already_ includes a number of measures that act 
 against too frequent math. So even though you can see 64-bit math code 
 in it, it's only rarely called if your clock has a low resolution - and 
 that happens all automatically! (see below the details of this buffered 
 delta math)
 
 I have not seen Roman notice and mention any of these important details 
 (perhaps because he was concentrating on finding faults in CFS - which a 
 reviewer should do), but those measures are still very important for a 
 complete, balanced picture, especially if one focuses on overhead on 
 small boxes where the clock is low-resolution.
 
 As Peter has said it in his detailed review of Roman's suggested 
 algorithm, our main focus is on keeping total complexity down - and we 
 are (of course) fundamentally open to changing the math behind CFS, we 
 ourselves tweaked it numerous times, it's not cast into stone in any 
 way, shape or form.

You're comparing apples with oranges, I explicitely said:

At this point I'm not that much interested in a few localized 
optimizations, what I'm interested in is how can this optimized at the 
design level

IMO it's very important to keep computational and algorithmic complexity 
separately, I want to concentrate on the latter, so unless you can _prove_ 
that a similiar set of optimizations is impossible within my example, I'm 
going to ignore them for now. CFS has already gone through several 
versions of optimization and tuning, expecting the same from my design 
prototype is a little confusing...

I want to analyze the foundation CFS is based on, in the review I 
mentioned a number of other issues and design related questions. If you 
need more time, that's fine, but I'd appreciate more background 
information related to that and not that you only jump on the more trivial 
issues.

 In Roman's variant of CFS's algorithm the variables are 32-bit, but the 
 error is rolled forward in separate fract_* (fractional) 32-bit 
 variables, so we still have 32+32==64 bit of stuff to handle. So we 
 think that in the end such a 32+32 scheme would be more complex (and 
 anyone who took a look at fs2.c would i think agree - it took Peter a 
 day to decypher the math!)

Come on, Ingo, you can do better than that, I did mention in my review 
some of the requirements for the data types.
I'm amazed how you can get to that judgement so quickly, could you please 
substantiate that a little more?

I admit that the lack of source comments is an open invitation for further 
questions and Peter did exactly this and his comments were great - I'm 
hoping for more like that. You OTOH jump to conclusions based on a partial 
understanding what I'm actually trying to do.
Ingo, how about you provide some of the mathematical prove CFS is based 
on? Can you prove that the rounding errors are irrelevant? Can you prove 
that all the limit checks can have no adverse effect? I tried that and I'm 
not entirely convinced of that, but maybe it's just me, so I'd love to see 
someone else's attempt at this.
A major goal of my design is it to be able to define the limits within the 
scheduler is working correctly, so I know which information is relevant 
and what can be approximated.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-02 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Linus Torvalds wrote:

 So I think it would be entirely appropriate to
 
  - do something that *approximates* microseconds.
 
Using microseconds instead of nanoseconds would likely allow us to do 
32-bit arithmetic in more areas, without any real overflow.

The basic problem is that one needs a number of bits (at least 16) for 
normalization, which limits the time range one can work with. This means 
that 32 bit leaves only room for 1 millisecond resolution, the remainder 
could maybe saved and reused later.
So AFAICT using micro- or nanosecond resolution doesn't make much 
computational difference.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-01 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

> > [...] I didn't say 'sleeper starvation' or 'rounding error', these are 
> > your words and it's your perception of what I said.
> 
> Oh dear :-) It was indeed my preception that yesterday you said:

*sigh* and here you go off again nitpicking on a minor issue just to prove 
your point...
When I wrote the earlier stuff I hadn't realized it was resolution 
related, so things have to be put into proper context and you make it 
yourself a little easy by equating them.
Yippi, you found another small error I made, can we drop this now? Please?

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-01 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

> Andi's theory cannot be true either, Roman's debug info also shows this 
> /proc//sched data:
> 
>   clock-delta  :  95
> 
> that means that sched_clock() is in high-res mode, the TSC is alive and 
> kicking and a sched_clock() call took 95 nanoseconds.
> 
> Roman, could you please help us with this mystery?

Actually, Andi is right. What I sent you was generated directly after 
boot, as I had to reboot for the right kernel, so a little later appeared 
this:

Aug  1 14:54:30 spit kernel: eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
Aug  1 15:09:56 spit kernel: Clocksource tsc unstable (delta = 656747233 ns)
Aug  1 15:09:56 spit kernel: Time: pit clocksource has been installed.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-01 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

> > > in that case 'top' accounting symptoms similar to the above are not 
> > > due to the scheduler starvation you suspected, but due the effect of 
> > > a low-resolution scheduler clock and a tightly coupled 
> > > timer/scheduler tick to it.
> > 
> > Well, it magnifies the rounding problems in CFS.
> 
> why do you say that? 2.6.22 behaves similarly with a low-res 
> sched_clock(). This has nothing to do with 'rounding problems'!
> 
> i tried your fl.c and if sched_clock() is high-resolution it's scheduled 
> _perfectly_ by CFS:
> 
>PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>   5906 mingo 20   0  1576  244  196 R 71.2  0.0   0:30.11 l
>   5909 mingo 20   0  1844  344  260 S  9.6  0.0   0:04.02 lt
>   5907 mingo 20   0  1844  508  424 S  9.5  0.0   0:04.01 lt
>   5908 mingo 20   0  1844  344  260 S  9.5  0.0   0:04.02 lt
> 
> if sched_clock() is low-resolution then indeed the 'lt' tasks will 
> "hide":
> 
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  2366 mingo 20   0  1576  248  196 R 99.9  0.0   0:07.95 loop_silent
> 1 root  20   0  2132  636  548 S  0.0  0.0   0:04.64 init
> 
> but that's nothing new. CFS cannot conjure up time measurement methods 
> that do not exist. If you have a low-res clock and if you create an app 
> that syncs precisely to the tick of that clock via timers that run off 
> that exact tick then there's nothing the scheduler can do about it. It 
> is false to charachterise this as 'sleeper starvation' or 'rounding 
> error' like you did. No amount of rounding logic can create a 
> high-resolution clock out of thin air.

Please calm down. You apparantly already get worked up about one of the 
secondary problems. I didn't say 'sleeper starvation' or 'rounding 
error', these are your words and it's your perception of what I said.

sched_clock() can have a low resolution, which can be a problem for the 
scheduler. This is all this program demonstrates. If and how this problem 
should be solved is a completely different issue, about which I haven't 
said anything yet and since it's not that important right now I'll leave 
it at that for now.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-01 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Andi Kleen wrote:

> > especially if one already knows that
> > scheduler clock has only limited resolution (because it's based on
> > jiffies), it becomes possible to use mostly 32bit values.
> 
> jiffies based sched_clock should be soon very rare. It's probably
> not worth optimizing for it.

I'm not so sure about that. sched_clock() has to be fast, so many archs 
may want to continue to use jiffies. As soon as one does that one can also 
save a lot of computational overhead by using 32bit instead of 64bit.
The question is then how easy that is possible.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-01 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

> Please also send me the output of this script:
> 
>   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

Send privately.

> Could you also please send the source code for the "l.c" and "lt.c" apps
> you used for your testing so i can have a look. Thanks!

l.c is a simple busy loop (well, with the option to start many of them).
This is lt.c, what it does is to run a bit less than a jiffie, so it 
needs a low resolution clock to trigger the problem:

#include 
#include 
#include 
#include 

#define NSEC 10
#define USEC 100

#define PERIOD  (NSEC/1000)

int i;

void worker(int sig)
{
struct timeval tv;
long long t0, t;

gettimeofday(, 0);
//printf("%u,%lu\n", i, tv.tv_usec);
t0 = (long long)tv.tv_sec * 100 + tv.tv_usec + PERIOD / 1000 - 50;
do {
gettimeofday(, 0);
t = (long long)tv.tv_sec * 100 + tv.tv_usec;
} while (t < t0);

}

int main(int ac, char **av)
{
int cnt;
timer_t timer;
struct itimerspec its;
struct sigaction sa;

cnt = i = atoi(av[1]);

sa.sa_handler = worker;
sa.sa_flags = 0;
sigemptyset(_mask);

sigaction(SIGALRM, , 0);

clock_gettime(CLOCK_MONOTONIC, _value);
its.it_interval.tv_sec = 0;
its.it_interval.tv_nsec = PERIOD * cnt;

while (--i > 0 && fork() > 0)
;

its.it_value.tv_nsec += i * PERIOD;
if (its.it_value.tv_nsec > NSEC) {
its.it_value.tv_sec++;
its.it_value.tv_nsec -= NSEC;
}

timer_create(CLOCK_MONOTONIC, 0, );
timer_settime(timer, TIMER_ABSTIME, , 0);

printf("%u,%lu\n", i, its.it_interval.tv_nsec);

while (1) 
pause();
return 0;
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-01 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

> * Roman Zippel <[EMAIL PROTECTED]> wrote:
> 
> > [...] the increase in code size:
> > 
> > 2.6.22:
> >textdata bss dec hex filename
> >   10150  243344   1351834ce kernel/sched.o
> > 
> > recent git:
> >textdata bss dec hex filename
> >   14724 2282020   16972424c kernel/sched.o
> > 
> > That's i386 without stats/debug. [...]
> 
> that's without CONFIG_SMP, right? :-) On SMP they are about net break 
> even:
> 
>  textdata bss dec hex filename
> 265354173  24   30732780c kernel/sched.o-2.6.22
> 283782574  16   3096878f8 kernel/sched.o-2.6.23-git

That's still quite an increase in some rather important code paths and 
it's not just the code size, but also code complexity which is important 
- a major point I tried to address in my review.

> (plus a further ~1.5K per CPU data reduction which is not visible here) 

That's why I mentioned the increased runtime memory usage...

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-01 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

> > [...] e.g. in this example there are three tasks that run only for 
> > about 1ms every 3ms, but they get far more time than should have 
> > gotten fairly:
> > 
> >  4544 roman 20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
> >  4545 roman 20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
> >  4546 roman 20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
> >  4547 roman 20   0  1532  272  216 R  3.3  0.2   0:01.94 l
> 
> Mike and me have managed to reproduce similarly looking 'top' output, 
> but it takes some effort: we had to deliberately run a non-TSC 
> sched_clock(), CONFIG_HZ=100, !CONFIG_NO_HZ and !CONFIG_HIGH_RES_TIMERS.

I used my old laptop for these tests, where tsc is indeed disabled due to 
instability. Otherwise the kernel was configured with CONFIG_HZ=1000.

> in that case 'top' accounting symptoms similar to the above are not due 
> to the scheduler starvation you suspected, but due the effect of a 
> low-resolution scheduler clock and a tightly coupled timer/scheduler 
> tick to it.

Well, it magnifies the rounding problems in CFS.
I mainly wanted to test a little the behaviour of CFS and I thought a saw 
patch which enabled the use of TSC in these cases, so I didn't check 
sched_clock().

Anyway, I want to point out that this wasn't the main focus of what I 
wrote.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-01 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

  [...] e.g. in this example there are three tasks that run only for 
  about 1ms every 3ms, but they get far more time than should have 
  gotten fairly:
  
   4544 roman 20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
   4545 roman 20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
   4546 roman 20   0  1796  344  256 R 31.7  0.3   0:21.07 lt
   4547 roman 20   0  1532  272  216 R  3.3  0.2   0:01.94 l
 
 Mike and me have managed to reproduce similarly looking 'top' output, 
 but it takes some effort: we had to deliberately run a non-TSC 
 sched_clock(), CONFIG_HZ=100, !CONFIG_NO_HZ and !CONFIG_HIGH_RES_TIMERS.

I used my old laptop for these tests, where tsc is indeed disabled due to 
instability. Otherwise the kernel was configured with CONFIG_HZ=1000.

 in that case 'top' accounting symptoms similar to the above are not due 
 to the scheduler starvation you suspected, but due the effect of a 
 low-resolution scheduler clock and a tightly coupled timer/scheduler 
 tick to it.

Well, it magnifies the rounding problems in CFS.
I mainly wanted to test a little the behaviour of CFS and I thought a saw 
patch which enabled the use of TSC in these cases, so I didn't check 
sched_clock().

Anyway, I want to point out that this wasn't the main focus of what I 
wrote.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-01 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

 * Roman Zippel [EMAIL PROTECTED] wrote:
 
  [...] the increase in code size:
  
  2.6.22:
 textdata bss dec hex filename
10150  243344   1351834ce kernel/sched.o
  
  recent git:
 textdata bss dec hex filename
14724 2282020   16972424c kernel/sched.o
  
  That's i386 without stats/debug. [...]
 
 that's without CONFIG_SMP, right? :-) On SMP they are about net break 
 even:
 
  textdata bss dec hex filename
 265354173  24   30732780c kernel/sched.o-2.6.22
 283782574  16   3096878f8 kernel/sched.o-2.6.23-git

That's still quite an increase in some rather important code paths and 
it's not just the code size, but also code complexity which is important 
- a major point I tried to address in my review.

 (plus a further ~1.5K per CPU data reduction which is not visible here) 

That's why I mentioned the increased runtime memory usage...

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-01 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Andi Kleen wrote:

  especially if one already knows that
  scheduler clock has only limited resolution (because it's based on
  jiffies), it becomes possible to use mostly 32bit values.
 
 jiffies based sched_clock should be soon very rare. It's probably
 not worth optimizing for it.

I'm not so sure about that. sched_clock() has to be fast, so many archs 
may want to continue to use jiffies. As soon as one does that one can also 
save a lot of computational overhead by using 32bit instead of 64bit.
The question is then how easy that is possible.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-01 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

 Please also send me the output of this script:
 
   http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

Send privately.

 Could you also please send the source code for the l.c and lt.c apps
 you used for your testing so i can have a look. Thanks!

l.c is a simple busy loop (well, with the option to start many of them).
This is lt.c, what it does is to run a bit less than a jiffie, so it 
needs a low resolution clock to trigger the problem:

#include stdio.h
#include signal.h
#include time.h
#include sys/time.h

#define NSEC 10
#define USEC 100

#define PERIOD  (NSEC/1000)

int i;

void worker(int sig)
{
struct timeval tv;
long long t0, t;

gettimeofday(tv, 0);
//printf(%u,%lu\n, i, tv.tv_usec);
t0 = (long long)tv.tv_sec * 100 + tv.tv_usec + PERIOD / 1000 - 50;
do {
gettimeofday(tv, 0);
t = (long long)tv.tv_sec * 100 + tv.tv_usec;
} while (t  t0);

}

int main(int ac, char **av)
{
int cnt;
timer_t timer;
struct itimerspec its;
struct sigaction sa;

cnt = i = atoi(av[1]);

sa.sa_handler = worker;
sa.sa_flags = 0;
sigemptyset(sa.sa_mask);

sigaction(SIGALRM, sa, 0);

clock_gettime(CLOCK_MONOTONIC, its.it_value);
its.it_interval.tv_sec = 0;
its.it_interval.tv_nsec = PERIOD * cnt;

while (--i  0  fork()  0)
;

its.it_value.tv_nsec += i * PERIOD;
if (its.it_value.tv_nsec  NSEC) {
its.it_value.tv_sec++;
its.it_value.tv_nsec -= NSEC;
}

timer_create(CLOCK_MONOTONIC, 0, timer);
timer_settime(timer, TIMER_ABSTIME, its, 0);

printf(%u,%lu\n, i, its.it_interval.tv_nsec);

while (1) 
pause();
return 0;
}

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-01 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

   in that case 'top' accounting symptoms similar to the above are not 
   due to the scheduler starvation you suspected, but due the effect of 
   a low-resolution scheduler clock and a tightly coupled 
   timer/scheduler tick to it.
  
  Well, it magnifies the rounding problems in CFS.
 
 why do you say that? 2.6.22 behaves similarly with a low-res 
 sched_clock(). This has nothing to do with 'rounding problems'!
 
 i tried your fl.c and if sched_clock() is high-resolution it's scheduled 
 _perfectly_ by CFS:
 
PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
   5906 mingo 20   0  1576  244  196 R 71.2  0.0   0:30.11 l
   5909 mingo 20   0  1844  344  260 S  9.6  0.0   0:04.02 lt
   5907 mingo 20   0  1844  508  424 S  9.5  0.0   0:04.01 lt
   5908 mingo 20   0  1844  344  260 S  9.5  0.0   0:04.02 lt
 
 if sched_clock() is low-resolution then indeed the 'lt' tasks will 
 hide:
 
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  2366 mingo 20   0  1576  248  196 R 99.9  0.0   0:07.95 loop_silent
 1 root  20   0  2132  636  548 S  0.0  0.0   0:04.64 init
 
 but that's nothing new. CFS cannot conjure up time measurement methods 
 that do not exist. If you have a low-res clock and if you create an app 
 that syncs precisely to the tick of that clock via timers that run off 
 that exact tick then there's nothing the scheduler can do about it. It 
 is false to charachterise this as 'sleeper starvation' or 'rounding 
 error' like you did. No amount of rounding logic can create a 
 high-resolution clock out of thin air.

Please calm down. You apparantly already get worked up about one of the 
secondary problems. I didn't say 'sleeper starvation' or 'rounding 
error', these are your words and it's your perception of what I said.

sched_clock() can have a low resolution, which can be a problem for the 
scheduler. This is all this program demonstrates. If and how this problem 
should be solved is a completely different issue, about which I haven't 
said anything yet and since it's not that important right now I'll leave 
it at that for now.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-01 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

 Andi's theory cannot be true either, Roman's debug info also shows this 
 /proc/PID/sched data:
 
   clock-delta  :  95
 
 that means that sched_clock() is in high-res mode, the TSC is alive and 
 kicking and a sched_clock() call took 95 nanoseconds.
 
 Roman, could you please help us with this mystery?

Actually, Andi is right. What I sent you was generated directly after 
boot, as I had to reboot for the right kernel, so a little later appeared 
this:

Aug  1 14:54:30 spit kernel: eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
Aug  1 15:09:56 spit kernel: Clocksource tsc unstable (delta = 656747233 ns)
Aug  1 15:09:56 spit kernel: Time: pit clocksource has been installed.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS review

2007-08-01 Thread Roman Zippel
Hi,

On Wed, 1 Aug 2007, Ingo Molnar wrote:

  [...] I didn't say 'sleeper starvation' or 'rounding error', these are 
  your words and it's your perception of what I said.
 
 Oh dear :-) It was indeed my preception that yesterday you said:

*sigh* and here you go off again nitpicking on a minor issue just to prove 
your point...
When I wrote the earlier stuff I hadn't realized it was resolution 
related, so things have to be put into proper context and you make it 
yourself a little easy by equating them.
Yippi, you found another small error I made, can we drop this now? Please?

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: Linus 2.6.23-rc1

2007-07-31 Thread Roman Zippel
Hi,

On Sat, 28 Jul 2007, Linus Torvalds wrote:

> We've had people go with a splash before. Quite frankly, the current 
> scheduler situation looks very much like the CML2 situation. Anybody 
> remember that? The developer there also got rejected, the improvement was 
> made differently (and much more in line with existing practices and 
> maintainership), and life went on. Eric Raymond, however, left with a 
> splash.

Since I was directly involved I'd like to point out a key difference.

http://lkml.org/lkml/2002/2/21/57 was the very first start of Kconfig and 
initially I didn't plan on writing a new config system. At the beginning 
there was only the converter, which I did to address the issue that Eric 
created a complete new and different config database, so the converter was 
meant to create a more acceptable transition path. What happened next is 
that I haven't got a single response from Eric, so I continued hacking on 
it until was complete.

The key difference is now that Eric refused the offered help, while Con 
was refused the help he needed to get his work integrated.

When Ingo posted his rewrite http://lkml.org/lkml/2007/4/13/180, Con had 
already pretty much lost. I have no doubt that Ingo can quickly transform 
an idea into working code and I would've been very surprised if he 
wouldn't be able to turn it into something technically superior. When Ingo 
figured out how to implement fair scheduling in a better way, he didn't 
use this idea to help Con to improve his work. He decided instead to 
work against Con and started his own rewrite, this is of course his right 
to do, but then he should also accept the responsibility that Con felt his 
years of work ripped apart and in vain and we have now lost a developer 
who tried to address things from a different perspective.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


CFS review

2007-07-31 Thread Roman Zippel
Hi,

On Sat, 14 Jul 2007, Mike Galbraith wrote:

> > On Fri, 13 Jul 2007, Mike Galbraith wrote:
> > 
> > > > The new scheduler does _a_lot_ of heavy 64 bit calculations without any 
> > > > attempt to scale that down a little...
> > > 
> > > See prio_to_weight[], prio_to_wmult[] and sysctl_sched_stat_granularity.
> > > Perhaps more can be done, but "without any attempt..." isn't accurate.
> > 
> > Calculating these values at runtime would have been completely insane, the 
> > alternative would be a crummy approximation, so using a lookup table is 
> > actually a good thing. That's not the problem.
> 
> I meant see usage.

I more meant serious attempts. At this point I'm not that much interested 
in a few localized optimizations, what I'm interested in is how can this 
optimized at the design level (e.g. how can arch information be used to 
simplify things). So I spent quite a bit of time looking through cfs and 
experimenting with some ideas. I want to put the main focus on the 
performance aspect, but there are a few other issues as well.


But first something else (especially for Ingo): I tried to be very careful 
with any claims made in this mail, but this of course doesn't exclude the 
possibility of errors, in which case I'd appreciate any corrections. Any 
explanations done in this mail don't imply that anyone needs any such 
explanations, they're done to keep things in context, so that interested 
readers have a chance to follow even if they don't have the complete 
background information. Any suggestions made don't imply that they have to 
be implemented like this, there are more an incentive for further 
discussion and I'm always interested in better solutions.


A first indication that something may not be quite right is the increase
in code size:

2.6.22:
   textdata bss dec hex filename
  10150  243344   1351834ce kernel/sched.o

recent git:
   textdata bss dec hex filename
  14724 2282020   16972424c kernel/sched.o

That's i386 without stats/debug. A lot of the new code is in regularly
executed regions and it's often not exactly trivial code as cfs added
lots of heavy 64bit calculations. With the increased text comes
increased runtime memory usage, e.g. task_struct increased so that only
5 of them instead 6 fit now into 8KB.

Since sched-design-CFS.txt doesn't really go into any serious detail, so
the EEVDF paper was more helpful and after playing with the ideas a
little I noticed that the whole idea of fair scheduling can be explained
somewhat simpler and I'm a little surprised not finding it mentioned
anywhere.
So a different view on this is that the runtime of a task is simply
normalized and the virtual time (or fair_clock) is the weighted average of
these normalized runtimes. The advantage of normalization is that it
makes things comparable, once the normalized time values are equal each
task got its fair share. It's more obvious in the EEVDF paper, cfs makes
it a bit more complicated, as it uses the virtual time to calculate the
eligible runtime, but it doesn't maintain a per process virtual time
(fair_key is not quite the same).

Here we get to the first problem, cfs is not overly accurate at
maintaining a precise balance. First there a lot of rounding errors due
to constant conversion between normalized and non-normalized values and
the higher the update frequency the bigger the error. The effect of
this can be seen by running:

while (1)
sched_yield();

and watching the sched_debug output and watch the underrun counter go 
crazy. cfs thus needs the limiting to keep this misbehaviour under
control. The problem here is that it's not that difficult to hit one of
the many limits, which may change the behaviour and makes cfs hard to
predict how it will behave under different situations.

The next issue is scheduler granularity, here I don't quite understand
why the actual running time has no influence at all, which makes it
difficult to predict how much cpu time a process will get at a time
(even the comments only refer to the vmstat output). What is basically
used instead is the normalized time since it was enqueued and
practically it's a bit more complicated, as fair_key is not entirely a
normalized time value. If the wait_runtime value is positive, higher
prioritized tasks are given even more priority than they already get
from their larger wait_runtime value. The problem here is that this
triggers underruns and lower priority tasks get even less time.

Another issue is the sleep bonus given to sleeping tasks. A problem here
is that this can be exploited, if a job is spread over a few threads,
they can get more time relativ to other tasks, e.g. in this example
there are three tasks that run only for about 1ms every 3ms, but they
get far more time than should have gotten fairly:

 4544 roman 20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
 4545 roman 20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
 

CFS review

2007-07-31 Thread Roman Zippel
Hi,

On Sat, 14 Jul 2007, Mike Galbraith wrote:

  On Fri, 13 Jul 2007, Mike Galbraith wrote:
  
The new scheduler does _a_lot_ of heavy 64 bit calculations without any 
attempt to scale that down a little...
   
   See prio_to_weight[], prio_to_wmult[] and sysctl_sched_stat_granularity.
   Perhaps more can be done, but without any attempt... isn't accurate.
  
  Calculating these values at runtime would have been completely insane, the 
  alternative would be a crummy approximation, so using a lookup table is 
  actually a good thing. That's not the problem.
 
 I meant see usage.

I more meant serious attempts. At this point I'm not that much interested 
in a few localized optimizations, what I'm interested in is how can this 
optimized at the design level (e.g. how can arch information be used to 
simplify things). So I spent quite a bit of time looking through cfs and 
experimenting with some ideas. I want to put the main focus on the 
performance aspect, but there are a few other issues as well.


But first something else (especially for Ingo): I tried to be very careful 
with any claims made in this mail, but this of course doesn't exclude the 
possibility of errors, in which case I'd appreciate any corrections. Any 
explanations done in this mail don't imply that anyone needs any such 
explanations, they're done to keep things in context, so that interested 
readers have a chance to follow even if they don't have the complete 
background information. Any suggestions made don't imply that they have to 
be implemented like this, there are more an incentive for further 
discussion and I'm always interested in better solutions.


A first indication that something may not be quite right is the increase
in code size:

2.6.22:
   textdata bss dec hex filename
  10150  243344   1351834ce kernel/sched.o

recent git:
   textdata bss dec hex filename
  14724 2282020   16972424c kernel/sched.o

That's i386 without stats/debug. A lot of the new code is in regularly
executed regions and it's often not exactly trivial code as cfs added
lots of heavy 64bit calculations. With the increased text comes
increased runtime memory usage, e.g. task_struct increased so that only
5 of them instead 6 fit now into 8KB.

Since sched-design-CFS.txt doesn't really go into any serious detail, so
the EEVDF paper was more helpful and after playing with the ideas a
little I noticed that the whole idea of fair scheduling can be explained
somewhat simpler and I'm a little surprised not finding it mentioned
anywhere.
So a different view on this is that the runtime of a task is simply
normalized and the virtual time (or fair_clock) is the weighted average of
these normalized runtimes. The advantage of normalization is that it
makes things comparable, once the normalized time values are equal each
task got its fair share. It's more obvious in the EEVDF paper, cfs makes
it a bit more complicated, as it uses the virtual time to calculate the
eligible runtime, but it doesn't maintain a per process virtual time
(fair_key is not quite the same).

Here we get to the first problem, cfs is not overly accurate at
maintaining a precise balance. First there a lot of rounding errors due
to constant conversion between normalized and non-normalized values and
the higher the update frequency the bigger the error. The effect of
this can be seen by running:

while (1)
sched_yield();

and watching the sched_debug output and watch the underrun counter go 
crazy. cfs thus needs the limiting to keep this misbehaviour under
control. The problem here is that it's not that difficult to hit one of
the many limits, which may change the behaviour and makes cfs hard to
predict how it will behave under different situations.

The next issue is scheduler granularity, here I don't quite understand
why the actual running time has no influence at all, which makes it
difficult to predict how much cpu time a process will get at a time
(even the comments only refer to the vmstat output). What is basically
used instead is the normalized time since it was enqueued and
practically it's a bit more complicated, as fair_key is not entirely a
normalized time value. If the wait_runtime value is positive, higher
prioritized tasks are given even more priority than they already get
from their larger wait_runtime value. The problem here is that this
triggers underruns and lower priority tasks get even less time.

Another issue is the sleep bonus given to sleeping tasks. A problem here
is that this can be exploited, if a job is spread over a few threads,
they can get more time relativ to other tasks, e.g. in this example
there are three tasks that run only for about 1ms every 3ms, but they
get far more time than should have gotten fairly:

 4544 roman 20   0  1796  520  432 S 32.1  0.4   0:21.08 lt
 4545 roman 20   0  1796  344  256 R 32.1  0.3   0:21.07 lt
 4546 roman 20   0  1796  344  

Re: [ck] Re: Linus 2.6.23-rc1

2007-07-31 Thread Roman Zippel
Hi,

On Sat, 28 Jul 2007, Linus Torvalds wrote:

 We've had people go with a splash before. Quite frankly, the current 
 scheduler situation looks very much like the CML2 situation. Anybody 
 remember that? The developer there also got rejected, the improvement was 
 made differently (and much more in line with existing practices and 
 maintainership), and life went on. Eric Raymond, however, left with a 
 splash.

Since I was directly involved I'd like to point out a key difference.

http://lkml.org/lkml/2002/2/21/57 was the very first start of Kconfig and 
initially I didn't plan on writing a new config system. At the beginning 
there was only the converter, which I did to address the issue that Eric 
created a complete new and different config database, so the converter was 
meant to create a more acceptable transition path. What happened next is 
that I haven't got a single response from Eric, so I continued hacking on 
it until was complete.

The key difference is now that Eric refused the offered help, while Con 
was refused the help he needed to get his work integrated.

When Ingo posted his rewrite http://lkml.org/lkml/2007/4/13/180, Con had 
already pretty much lost. I have no doubt that Ingo can quickly transform 
an idea into working code and I would've been very surprised if he 
wouldn't be able to turn it into something technically superior. When Ingo 
figured out how to implement fair scheduling in a better way, he didn't 
use this idea to help Con to improve his work. He decided instead to 
work against Con and started his own rewrite, this is of course his right 
to do, but then he should also accept the responsibility that Con felt his 
years of work ripped apart and in vain and we have now lost a developer 
who tried to address things from a different perspective.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i2o: defined but not used.

2007-07-26 Thread Roman Zippel
Hi,

On Saturday 21 July 2007, Andrew Morton wrote:

> On Sat, 21 Jul 2007 00:58:01 +0200 Sebastian Siewior 
<[EMAIL PROTECTED]> wrote:
> > Got with randconfig
>
> randconfig apparently generates impossible configs.  Please always
> run `make oldconfig' after the randconfig, then do the test build.

If that should make any difference, that would be a bug and I'd like to see 
that .config.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LinuxPPS - definitive version

2007-07-26 Thread Roman Zippel
Hi,

On Tuesday 24 July 2007, Rodolfo Giometti wrote:

> By doing:
>
>  struct pps_ktime {
> __u64 sec;
> -   __u32 nsec;
> +   __u64 nsec;
>  };

Just using __u32 for both works as well...

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] LinuxPPS - definitive version

2007-07-26 Thread Roman Zippel
Hi,

On Tuesday 24 July 2007, Rodolfo Giometti wrote:

 By doing:

  struct pps_ktime {
 __u64 sec;
 -   __u32 nsec;
 +   __u64 nsec;
  };

Just using __u32 for both works as well...

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i2o: defined but not used.

2007-07-26 Thread Roman Zippel
Hi,

On Saturday 21 July 2007, Andrew Morton wrote:

 On Sat, 21 Jul 2007 00:58:01 +0200 Sebastian Siewior 
[EMAIL PROTECTED] wrote:
  Got with randconfig

 randconfig apparently generates impossible configs.  Please always
 run `make oldconfig' after the randconfig, then do the test build.

If that should make any difference, that would be a bug and I'd like to see 
that .config.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-20 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Ingo Molnar wrote:

> > Why do you constantly stress level 19? Yes, that one is special, all 
> > other positive levels were already relatively consistent.
> 
> i constantly stress it for the reason i mentioned a good number of 
> times: because it's by far the most commonly used (and complained about) 
> nice level. =B-)

How do you know that? Most complained about makes most commonly used?

> but because you are asking, i'm glad to give you some first-hand 
> historic background about Linux nice levels (in case you are interested) 
> and the motivations behind their old and new implementations:

I guess I should be thankful now?
I'm curious why you post this now, after I "asked" about this. Most of the 
information is either rather generic or not specific enough for the 
problem at hand. If you had posted this information earlier, it had been 
far more valueable as it could have been a nice base for a discussion.
But posting it this late I can't lose the feeling you're more interested 
in "teaching" me.

> nice levels were always so weak under Linux (just read Peter's report) 

-ENOLINK

> Hope this helps,

Not completely.

For negative nice levels you mentioned audio apps, but these aren't really 
interested in a fair share, they would use the higher percentage only to 
guarantee they get the amount of time they need independent of the 
current load. I think they would be better served with e.g. a deadline 
scheduler, which guarantees them an absolute time share not a relative 
one.
On the other end with positive levels I more remember requests for 
something closer to idle scheduling, where a process only runs when 
nothing else is running.

So assuming we had scheduling classes for the above use cases, what other 
reasons are left for such extreme nice levels?

My proposed nice levels have otherwise the same properties as yours (e.g. 
being consistent). There is one propery you haven't commented on at all 
yet. My proposed levels give the average use a far better idea what they 
actually mean, i.e. that every 5 levels the process gets double/halve the 
cpu time. This is IMO a considerable advantage.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-20 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Ingo Molnar wrote:

> > > [more rude insults deleted]
> > > I've been waiting for that obvious question, and i _might_ be able
> > > to answer it, but somehow it never occured to you ;-) Thanks,
> 
> the ";-)" emoticon (and its contents) clearly signals this as a 
> sarcastic, tongue-in-cheek remark.

To take another example why is this still insulting and inappropriate, 
this is a behaviour I would characterize as school bullying:
A bully attacks someone obviously weaker than himself and for example 
takes something away and than continues like "If you ask nicely I'll give 
it back to you.", this often accompied by laughter to signal he's enjoying 
himself and the power he has, but for the other person it's everything but 
funny.

Maybe you don't know what it feels like, but I do and I can't find 
anything funny, sarcastic or whatever about this, no matter how many 
smileys or other tags you add there. If the communication is already that 
troubled as this, such "humor" is really the worst thing you can do and I 
find it rather sad that you can't realize this yourself.

> ok? (If you didnt see/read it as sarcastic straight away then my 
> apologies for insulting you!)

Sorry, that is too little too late. You've apologized before and you 
continued to make fun of me personally to the point of spreading wrong 
information about me, which you could have very easily verified yourself, 
if you only wanted.
What I want from you is that you treat me with respect and to keep your 
"sarcasm" to yourself.

I told you very clearly how I think about you requoting this crap and yet 
you repeat it again _twice_, so on the one hand I get this apology attempt 
and on the other hand you continue to kick me in the crotch? How do you 
think am I supposed to feel about this?

It's also always interesting what you don't respond to. I asked you for 
examples which would prove the (rather strong) assertions you made about 
me, what does it tell me now if you can't back up your statements?

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-20 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Jonathan Corbet wrote:

> > That's a bit my problem - we have to consider other setups as well.
> > Is it worth converting all msleep users behind their back or should we 
> > just a provide a separate function for those who care?
> 
> Any additional overhead is clearly small - enough not to disrupt a *high
> resolution* timer, after all.

If you use already high resolution timer, you also need a fast time 
source, so that in this case it indeed doesn't matter much how you sleep.

>  And msleep() is used mostly during
> initialization time.  My box had a few hundred calls, almost all during
> boot.  Any cost will be bounded by the fact that, well, it sleeps for
> milliseconds on every call.

Well, there are over 1500 msleep calls, so I'm not sure they're mostly 
during initialization.

> I'm not *that* attached to this patch; if it causes heartburn we can
> just forget it.  But I had thought it might be useful...

I'm not against using hrtimer in drivers, if you add a hrsleep() function 
and use that, that would be perfectly fine.
The really important point is to keep our APIs clean, so it's obvious who 
is using what. The requirements for both timers are different, so there 
should be a choice in what to use.

> > Which driver is this? I'd like to look at this, in case there's some other 
> > hidden problem. 
> 
> drivers/media/video/cafe_ccic.c, and cafe_smbus_write_data() in
> particular.  The "hidden problem," though, is that the hardware has
> periods where reading the status registers will send it off into its
> room where it will hide under its bed and never come out.

It's indeed not a trivial problem, as it's not localized to the driver 
(the request comes from generic code).
The most elegant and general solution might be to move such initialization 
sequences into a separate thread, where they don't hold up the rest.

> My understanding is that the current dyntick code only turns off the
> tick during idle periods; while things are running it's business as
> usual.  Perhaps I misunderstood?

jiffies needs to be updated, theoretically one could reduce the timer 
tick even then, but one has to be careful about the increased resolution, 
so jiffies+1 isn't enough anymore to round it up.
In general it's doable by further cleaning up our APIs, but here it's 
really important to keep the APIs clean to keep Linux running on a wide 
range of hardware. It should be clear whether one requests a low 
resolution, but low overhead timer or a high resolution and more precise 
timer (and _please_ ignore that "likely to expire" stuff).
It's e.g. a possibility to map everything to high resolution timer on a 
hardware, which can deal with this, but on other hardware that's not
possible without paying a significant price.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-20 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Jonathan Corbet wrote:

  That's a bit my problem - we have to consider other setups as well.
  Is it worth converting all msleep users behind their back or should we 
  just a provide a separate function for those who care?
 
 Any additional overhead is clearly small - enough not to disrupt a *high
 resolution* timer, after all.

If you use already high resolution timer, you also need a fast time 
source, so that in this case it indeed doesn't matter much how you sleep.

  And msleep() is used mostly during
 initialization time.  My box had a few hundred calls, almost all during
 boot.  Any cost will be bounded by the fact that, well, it sleeps for
 milliseconds on every call.

Well, there are over 1500 msleep calls, so I'm not sure they're mostly 
during initialization.

 I'm not *that* attached to this patch; if it causes heartburn we can
 just forget it.  But I had thought it might be useful...

I'm not against using hrtimer in drivers, if you add a hrsleep() function 
and use that, that would be perfectly fine.
The really important point is to keep our APIs clean, so it's obvious who 
is using what. The requirements for both timers are different, so there 
should be a choice in what to use.

  Which driver is this? I'd like to look at this, in case there's some other 
  hidden problem. 
 
 drivers/media/video/cafe_ccic.c, and cafe_smbus_write_data() in
 particular.  The hidden problem, though, is that the hardware has
 periods where reading the status registers will send it off into its
 room where it will hide under its bed and never come out.

It's indeed not a trivial problem, as it's not localized to the driver 
(the request comes from generic code).
The most elegant and general solution might be to move such initialization 
sequences into a separate thread, where they don't hold up the rest.

 My understanding is that the current dyntick code only turns off the
 tick during idle periods; while things are running it's business as
 usual.  Perhaps I misunderstood?

jiffies needs to be updated, theoretically one could reduce the timer 
tick even then, but one has to be careful about the increased resolution, 
so jiffies+1 isn't enough anymore to round it up.
In general it's doable by further cleaning up our APIs, but here it's 
really important to keep the APIs clean to keep Linux running on a wide 
range of hardware. It should be clear whether one requests a low 
resolution, but low overhead timer or a high resolution and more precise 
timer (and _please_ ignore that likely to expire stuff).
It's e.g. a possibility to map everything to high resolution timer on a 
hardware, which can deal with this, but on other hardware that's not
possible without paying a significant price.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-20 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Ingo Molnar wrote:

   [more rude insults deleted]
   I've been waiting for that obvious question, and i _might_ be able
   to answer it, but somehow it never occured to you ;-) Thanks,
 
 the ;-) emoticon (and its contents) clearly signals this as a 
 sarcastic, tongue-in-cheek remark.

To take another example why is this still insulting and inappropriate, 
this is a behaviour I would characterize as school bullying:
A bully attacks someone obviously weaker than himself and for example 
takes something away and than continues like If you ask nicely I'll give 
it back to you., this often accompied by laughter to signal he's enjoying 
himself and the power he has, but for the other person it's everything but 
funny.

Maybe you don't know what it feels like, but I do and I can't find 
anything funny, sarcastic or whatever about this, no matter how many 
smileys or other tags you add there. If the communication is already that 
troubled as this, such humor is really the worst thing you can do and I 
find it rather sad that you can't realize this yourself.

 ok? (If you didnt see/read it as sarcastic straight away then my 
 apologies for insulting you!)

Sorry, that is too little too late. You've apologized before and you 
continued to make fun of me personally to the point of spreading wrong 
information about me, which you could have very easily verified yourself, 
if you only wanted.
What I want from you is that you treat me with respect and to keep your 
sarcasm to yourself.

I told you very clearly how I think about you requoting this crap and yet 
you repeat it again _twice_, so on the one hand I get this apology attempt 
and on the other hand you continue to kick me in the crotch? How do you 
think am I supposed to feel about this?

It's also always interesting what you don't respond to. I asked you for 
examples which would prove the (rather strong) assertions you made about 
me, what does it tell me now if you can't back up your statements?

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-20 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Ingo Molnar wrote:

  Why do you constantly stress level 19? Yes, that one is special, all 
  other positive levels were already relatively consistent.
 
 i constantly stress it for the reason i mentioned a good number of 
 times: because it's by far the most commonly used (and complained about) 
 nice level. =B-)

How do you know that? Most complained about makes most commonly used?

 but because you are asking, i'm glad to give you some first-hand 
 historic background about Linux nice levels (in case you are interested) 
 and the motivations behind their old and new implementations:

I guess I should be thankful now?
I'm curious why you post this now, after I asked about this. Most of the 
information is either rather generic or not specific enough for the 
problem at hand. If you had posted this information earlier, it had been 
far more valueable as it could have been a nice base for a discussion.
But posting it this late I can't lose the feeling you're more interested 
in teaching me.

 nice levels were always so weak under Linux (just read Peter's report) 

-ENOLINK

 Hope this helps,

Not completely.

For negative nice levels you mentioned audio apps, but these aren't really 
interested in a fair share, they would use the higher percentage only to 
guarantee they get the amount of time they need independent of the 
current load. I think they would be better served with e.g. a deadline 
scheduler, which guarantees them an absolute time share not a relative 
one.
On the other end with positive levels I more remember requests for 
something closer to idle scheduling, where a process only runs when 
nothing else is running.

So assuming we had scheduling classes for the above use cases, what other 
reasons are left for such extreme nice levels?

My proposed nice levels have otherwise the same properties as yours (e.g. 
being consistent). There is one propery you haven't commented on at all 
yet. My proposed levels give the average use a far better idea what they 
actually mean, i.e. that every 5 levels the process gets double/halve the 
cpu time. This is IMO a considerable advantage.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-18 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Ingo Molnar wrote:

> _changing_ it is an option within reason, and we've done it a couple of 
> times already in the past, and even within CFS (as Peter correctly 
> observed) we've been through a couple of iterations already. And as i 
> mentioned it before, the outer edge of nice levels (+19, by far the most 
> commonly used nice level) was inconsistent to begin with: 3%, 5%, 9% of 
> nice-0, depending on HZ.

Why do you constantly stress level 19? Yes, that one is special, all other 
positive levels were already relatively consistent.

> So changing that to a consistent (and 
> user-requested)

How old is CFS and how many users did it have so far? How many users has 
the old scheduler, which will be exposed to the new one soon?

> 1.5% is a much smaller change than you seem to make it 
> out to be.

The percentage levels are off by a factor of upto _seven_, sorry I fail 
see how you can characterize this as "small".

> So by your standard we could never change the 
> scheduler. (which your ultimate argument might be after all =B-)

Careful, you make assertion about me, for which you have absolutely no 
base, adding a smiley doesn't make this any funnier.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-18 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Peter Zijlstra wrote:

> The only expectation is that a process with a lower nice level gets more
> time. Any other expectation is a bug.

Yes, users are buggy, they expect a lot of stupid things...
Is this really reason enough to break this?

What exactly is the damage if setpriority() accepts a few more levels?

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-18 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Peter Zijlstra wrote:

> By breaking the UNIX model of nice levels. Not an option in my book.

BTW what is the "UNIX model of nice levels"?

SUS specifies the limit via NZERO, which is defined as "Minimum Acceptable 
Value: 20", I can't find any information that it must be 20.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-18 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Peter Zijlstra wrote:

> By breaking the UNIX model of nice levels. Not an option in my book.

Breaking user expectations of nice levels is?

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-18 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Peter Zijlstra wrote:

> I actually like the extra range, it allows for a much softer punch of
> background tasks even on somewhat slower boxen.

The extra range is not really a problem, in 

http://www.ussg.iu.edu/hypermail/linux/kernel/0707.2/0850.html

I suggested how we can have both.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-18 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Ingo Molnar wrote:

> > > Roman, please do me a favor, and ask me the following question:
> > > 
> > >  [insult deleted]


> In this discussion about 
> nice levels you were (very) agressively asserting things that were 
> untrue,

Instead of simply asserting things, how about you provide some examples?
I made so far a single mistake of mixing up nice levels 18 and 19.
If you would point me to such examples, I could learn how to tone it down 
a little, since the nice levels are not the only issue I have with the new 
scheduler, the heavy stuff is still about to come. The problem here is 
there is too much burnt ground so I can't just present raw ideas, which 
get flamed by you, I have to be sufficiently confident they are valid, 
what you might then interpret as "agressive assertion".

> you were suggesting that i dont understand the code,

Again, please point me to examples, so I at least have a chance to clear 
things up, since it was never my intention to make such a suggestion, but 
this gives me no chance to defend myself.

OTOH I can tell you exactly how you continuously insult me, e.g. by 
suggesting I ask "stupid questions" or that I'm in "denial of facts".
Don't make such suggestions if you have no idea how insulting they are. 
Especially the one deleted insult above where you have the impertinence to 
quote it, such tone is more appropriate between lord and inferior, where 
the latter have to make a request and the former "might" grant it. 
_Never_ make me beg. :-(

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-18 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Ingo Molnar wrote:

   Roman, please do me a favor, and ask me the following question:
   
[insult deleted]


 In this discussion about 
 nice levels you were (very) agressively asserting things that were 
 untrue,

Instead of simply asserting things, how about you provide some examples?
I made so far a single mistake of mixing up nice levels 18 and 19.
If you would point me to such examples, I could learn how to tone it down 
a little, since the nice levels are not the only issue I have with the new 
scheduler, the heavy stuff is still about to come. The problem here is 
there is too much burnt ground so I can't just present raw ideas, which 
get flamed by you, I have to be sufficiently confident they are valid, 
what you might then interpret as agressive assertion.

 you were suggesting that i dont understand the code,

Again, please point me to examples, so I at least have a chance to clear 
things up, since it was never my intention to make such a suggestion, but 
this gives me no chance to defend myself.

OTOH I can tell you exactly how you continuously insult me, e.g. by 
suggesting I ask stupid questions or that I'm in denial of facts.
Don't make such suggestions if you have no idea how insulting they are. 
Especially the one deleted insult above where you have the impertinence to 
quote it, such tone is more appropriate between lord and inferior, where 
the latter have to make a request and the former might grant it. 
_Never_ make me beg. :-(

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-18 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Peter Zijlstra wrote:

 I actually like the extra range, it allows for a much softer punch of
 background tasks even on somewhat slower boxen.

The extra range is not really a problem, in 

http://www.ussg.iu.edu/hypermail/linux/kernel/0707.2/0850.html

I suggested how we can have both.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-18 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Peter Zijlstra wrote:

 By breaking the UNIX model of nice levels. Not an option in my book.

Breaking user expectations of nice levels is?

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-18 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Peter Zijlstra wrote:

 By breaking the UNIX model of nice levels. Not an option in my book.

BTW what is the UNIX model of nice levels?

SUS specifies the limit via NZERO, which is defined as Minimum Acceptable 
Value: 20, I can't find any information that it must be 20.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-18 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Peter Zijlstra wrote:

 The only expectation is that a process with a lower nice level gets more
 time. Any other expectation is a bug.

Yes, users are buggy, they expect a lot of stupid things...
Is this really reason enough to break this?

What exactly is the damage if setpriority() accepts a few more levels?

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-18 Thread Roman Zippel
Hi,

On Wed, 18 Jul 2007, Ingo Molnar wrote:

 _changing_ it is an option within reason, and we've done it a couple of 
 times already in the past, and even within CFS (as Peter correctly 
 observed) we've been through a couple of iterations already. And as i 
 mentioned it before, the outer edge of nice levels (+19, by far the most 
 commonly used nice level) was inconsistent to begin with: 3%, 5%, 9% of 
 nice-0, depending on HZ.

Why do you constantly stress level 19? Yes, that one is special, all other 
positive levels were already relatively consistent.

 So changing that to a consistent (and 
 user-requested)

How old is CFS and how many users did it have so far? How many users has 
the old scheduler, which will be exposed to the new one soon?

 1.5% is a much smaller change than you seem to make it 
 out to be.

The percentage levels are off by a factor of upto _seven_, sorry I fail 
see how you can characterize this as small.

 So by your standard we could never change the 
 scheduler. (which your ultimate argument might be after all =B-)

Careful, you make assertion about me, for which you have absolutely no 
base, adding a smiley doesn't make this any funnier.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-17 Thread Roman Zippel
Hi,

On Tue, 17 Jul 2007, Ingo Molnar wrote:

> * Roman Zippel <[EMAIL PROTECTED]> wrote:
> 
> > > > It's nice that these artifacts are gone, but that still doesn't 
> > > > explain why this ratio had to be increase that much from around 
> > > > 1:10 to 1:69.
> > > 
> > > More dynamic range is better? If you actually want a task to get 20x 
> > > the CPU time of another, the older scheduler doesn't really allow 
> > > it.
> > 
> > You can already have that, the complete range level from 19 to -20 was 
> > about 1:80.
> 
> But that is irrelevant: all tasks start out at nice 0, and what matters 
> is the dynamic range around 0.
> 
> So the dynamic range has been made uniform in the positive from 
> 1:10...1:20...1:30 to 1:69 for nice +19, and from 1:8 to 1:69 in the 
> minus. (with 1:86 nice -20) If you look at the negative nice levels 
> alone it's a substantial increase but if you compare it with positive 
> nice levels you'll similar kinds of dynamic ranges were already present 
> in the old scheduler and you'll see why we've done it.

So let's look at them:

for (i=0;i<20;i++) print i, " : ", (20-i)*5, " : ", 100*1.25^-i, " : ", 
e(l(2)*(-i/5))*100, "\n";
0 : 100 : 100 : 100.
1 : 95 : 80. : 87.05505632961241391300
2 : 90 : 64. : 75.78582832551990411700
3 : 85 : 51.2000 : 65.97539553864471296900
4 : 80 : 40.9600 : 57.43491774985175034000
5 : 75 : 32.7680 : 50.
6 : 70 : 26.2144 : 43.52752816480620695700
7 : 65 : 20.97152000 : 37.89291416275995205900
8 : 60 : 16.77721600 : 32.98769776932235648400
9 : 55 : 13.42177280 : 28.71745887492587517000
10 : 50 : 10.73741824 : 25.
11 : 45 : 8.589934592000 : 21.76376408240310347800
12 : 40 : 6.871947673600 : 18.94645708137997602900
13 : 35 : 5.497558138880 : 16.49384888466117824200
14 : 30 : 4.398046511104 : 14.35872943746293758500
15 : 25 : 3.5184372088832000 : 12.5000
16 : 20 : 2.8147497671065600 : 10.88188204120155173900
17 : 15 : 2.2517998136852480 : 9.47322854068998801400
18 : 10 : 1.8014398509481984 : 8.24692444233058912100
19 : 5 : 1.44115188075855872000 : 7.17936471873146879200

(nice level : old % : new % : my suggested %)

Your levels divert very quickly from what they used to be (upto a factor 
of 7), it's also not really easy to remember what the individual levels 
mean.
I at least try to keep them somewhat in the range they used to be (and 
the difference is limited to a factor of about 2), also every 5 levels the 
amount of cpu time is halved, which is very easy to remember.

If you need more dynamic range, is there a law that prevents us from going 
beyond 19? For example:

for (i=20;i<=30;i++) print i, " : ", (20-i)*5, " : ", 100*1.25^-i, " : ", 
e(l(2)*(-i/5))*100, "\n";
20 : 0 : 1.15292150460684697600 : 6.2500
21 : -5 : .92233720368547758000 : 5.44094102060077586900
22 : -10 : .73786976294838206400 : 4.73661427034499400700
23 : -15 : .59029581035870565100 : 4.12346222116529456000
24 : -20 : .47223664828696452100 : 3.58968235936573439600
25 : -25 : .37778931862957161700 : 3.1250
26 : -30 : .30223145490365729300 : 2.72047051030038793400
27 : -35 : .24178516392292583400 : 2.36830713517249700300
28 : -40 : .19342813113834066700 : 2.06173111058264728000
29 : -45 : .15474250491067253400 : 1.79484117968286719800
30 : -50 : .12379400392853802700 : 1.5625

setpriority() accepts such values without error.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-17 Thread Roman Zippel
Hi,

On Tue, 17 Jul 2007, Ingo Molnar wrote:

> Roman, please do me a favor, and ask me the following question:
> 
>  " Ingo, you've been maintaining the scheduler for years. In fact you 
>wrote the old nice code we are talking about here. You changed it a
>number of times since then. So you really know what's going on here. 
>Why does the old nice code behave like that for nice +19 levels? "
> 
> I've been waiting for that obvious question, and i _might_ be able to 
> answer it, but somehow it never occured to you ;-) Thanks,

Do you have any idea how insulting and arrogant this is?
Let me translate for you, how this arrived:

"O Ingo, who art our god of the scheduler. You have blessed the paths I 
walked in. You kept me from sinning numerous times. Your wisdom is 
infinite. Guide me on the journey that layeth ahead of me into this world 
knowledge of Your truth."

(I apologize already in advance, if I should have hurt anyones religious 
feelings.)

It's obvious that you have more experience with the scheduler code, but 
does that make you unfailable? Does that give you the right to act like a 
jerk?
I do make mistakes, I try to learn from them and life goes on, I have no 
problem with that, but what I have a problem with is if someone is abusing 
this to his own advantage. I have to be extremely carful what I say to 
you, because you jump on the first small mistake and I have to bear your 
insults like "there's nothing i can do about your denial of facts - that 
is your own private problem." I have no problems with facts, I'm only 
trying very hard to ignore your arrogant behaviour...
If you have something to contribute to this discussion which might clear 
things up, then just say it, but I'm not going to beg for it.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-17 Thread Roman Zippel
Hi,

On Tue, 17 Jul 2007, Ingo Molnar wrote:

 Roman, please do me a favor, and ask me the following question:
 
   Ingo, you've been maintaining the scheduler for years. In fact you 
wrote the old nice code we are talking about here. You changed it a
number of times since then. So you really know what's going on here. 
Why does the old nice code behave like that for nice +19 levels? 
 
 I've been waiting for that obvious question, and i _might_ be able to 
 answer it, but somehow it never occured to you ;-) Thanks,

Do you have any idea how insulting and arrogant this is?
Let me translate for you, how this arrived:

O Ingo, who art our god of the scheduler. You have blessed the paths I 
walked in. You kept me from sinning numerous times. Your wisdom is 
infinite. Guide me on the journey that layeth ahead of me into this world 
knowledge of Your truth.

(I apologize already in advance, if I should have hurt anyones religious 
feelings.)

It's obvious that you have more experience with the scheduler code, but 
does that make you unfailable? Does that give you the right to act like a 
jerk?
I do make mistakes, I try to learn from them and life goes on, I have no 
problem with that, but what I have a problem with is if someone is abusing 
this to his own advantage. I have to be extremely carful what I say to 
you, because you jump on the first small mistake and I have to bear your 
insults like there's nothing i can do about your denial of facts - that 
is your own private problem. I have no problems with facts, I'm only 
trying very hard to ignore your arrogant behaviour...
If you have something to contribute to this discussion which might clear 
things up, then just say it, but I'm not going to beg for it.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-17 Thread Roman Zippel
Hi,

On Tue, 17 Jul 2007, Ingo Molnar wrote:

 * Roman Zippel [EMAIL PROTECTED] wrote:
 
It's nice that these artifacts are gone, but that still doesn't 
explain why this ratio had to be increase that much from around 
1:10 to 1:69.
   
   More dynamic range is better? If you actually want a task to get 20x 
   the CPU time of another, the older scheduler doesn't really allow 
   it.
  
  You can already have that, the complete range level from 19 to -20 was 
  about 1:80.
 
 But that is irrelevant: all tasks start out at nice 0, and what matters 
 is the dynamic range around 0.
 
 So the dynamic range has been made uniform in the positive from 
 1:10...1:20...1:30 to 1:69 for nice +19, and from 1:8 to 1:69 in the 
 minus. (with 1:86 nice -20) If you look at the negative nice levels 
 alone it's a substantial increase but if you compare it with positive 
 nice levels you'll similar kinds of dynamic ranges were already present 
 in the old scheduler and you'll see why we've done it.

So let's look at them:

for (i=0;i20;i++) print i,  : , (20-i)*5,  : , 100*1.25^-i,  : , 
e(l(2)*(-i/5))*100, \n;
0 : 100 : 100 : 100.
1 : 95 : 80. : 87.05505632961241391300
2 : 90 : 64. : 75.78582832551990411700
3 : 85 : 51.2000 : 65.97539553864471296900
4 : 80 : 40.9600 : 57.43491774985175034000
5 : 75 : 32.7680 : 50.
6 : 70 : 26.2144 : 43.52752816480620695700
7 : 65 : 20.97152000 : 37.89291416275995205900
8 : 60 : 16.77721600 : 32.98769776932235648400
9 : 55 : 13.42177280 : 28.71745887492587517000
10 : 50 : 10.73741824 : 25.
11 : 45 : 8.589934592000 : 21.76376408240310347800
12 : 40 : 6.871947673600 : 18.94645708137997602900
13 : 35 : 5.497558138880 : 16.49384888466117824200
14 : 30 : 4.398046511104 : 14.35872943746293758500
15 : 25 : 3.5184372088832000 : 12.5000
16 : 20 : 2.8147497671065600 : 10.88188204120155173900
17 : 15 : 2.2517998136852480 : 9.47322854068998801400
18 : 10 : 1.8014398509481984 : 8.24692444233058912100
19 : 5 : 1.44115188075855872000 : 7.17936471873146879200

(nice level : old % : new % : my suggested %)

Your levels divert very quickly from what they used to be (upto a factor 
of 7), it's also not really easy to remember what the individual levels 
mean.
I at least try to keep them somewhat in the range they used to be (and 
the difference is limited to a factor of about 2), also every 5 levels the 
amount of cpu time is halved, which is very easy to remember.

If you need more dynamic range, is there a law that prevents us from going 
beyond 19? For example:

for (i=20;i=30;i++) print i,  : , (20-i)*5,  : , 100*1.25^-i,  : , 
e(l(2)*(-i/5))*100, \n;
20 : 0 : 1.15292150460684697600 : 6.2500
21 : -5 : .92233720368547758000 : 5.44094102060077586900
22 : -10 : .73786976294838206400 : 4.73661427034499400700
23 : -15 : .59029581035870565100 : 4.12346222116529456000
24 : -20 : .47223664828696452100 : 3.58968235936573439600
25 : -25 : .37778931862957161700 : 3.1250
26 : -30 : .30223145490365729300 : 2.72047051030038793400
27 : -35 : .24178516392292583400 : 2.36830713517249700300
28 : -40 : .19342813113834066700 : 2.06173111058264728000
29 : -45 : .15474250491067253400 : 1.79484117968286719800
30 : -50 : .12379400392853802700 : 1.5625

setpriority() accepts such values without error.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Tue, 17 Jul 2007, I wrote:

> Playing around with some other nice levels, confirms the theory that 
> something is a little off, so I'm quite correct at saying that the ratio 
> _should_ be 1:10.

Rechecking everything there was actually a small error in my test program, 
so the ratio should be at 1:20. Sorry about that mistake.
Nice level 19 shows the largest artifacts, as that level only gets a 
single tick, so the ratio is often 1:HZ/10 (except for 1000HZ where it's 
5:100). Nevertheless it's still true that in general nice levels were 
independent of HZ (that's all I wanted to say a couple of mails ago).

Ingo, you can start now gloating, but contrary to you I have no problems 
with admitting mistakes and apologizing for them. The point is just that 
I'm reacting better to factual arguments instead of flames (and I think 
it's not just me), so I'm pretty sure I'm still correct about this:

> OTOH you are the one who is wrong about me (again). :-(

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Tue, 17 Jul 2007, Ingo Molnar wrote:

> * Roman Zippel <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> > 
> > On Mon, 16 Jul 2007, Ingo Molnar wrote:
> > 
> > > and note that even on the old scheduler, nice-0 was "3200% more 
> > > powerful" than nice +19 (with CONFIG_HZ=300),
> > 
> > How did you get that value? At any HZ the ratio should be around 1:10 
> > (+- rounding error).
> 
> you are wrong again. I sent you the numbers earlier today already:
> 
> |   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> |  2332 mingo 25   0  1580  248  196 R 95.1  0.0   0:11.84 loop
> |  2335 mingo 39  19  1576  244  196 R  3.1  0.0   0:00.39 loop
> 
> 3.1% is 3067% more than 95.1%, and the ratio is 1:30.67. You again deny 
> above that this is the case, and there's nothing i can do about your 
> denial of facts - that is your own private problem.

Ingo, how am I supposed to react to this? I'm asking a simple question
and I get this? I'm at serious loss how to deal with you. :-(

Above is based on theoritical values, for a 300HZ kernel these two 
processes should get 30 and 3 ticks. Should there be any rounding error or 
off by one error so that the processes get one tick less than they should 
get or one tick is accounted to the wrong process, my theoritical value is 
still within the possible error range and doesn't contradict your
practical values.
Playing around with some other nice levels, confirms the theory that 
something is a little off, so I'm quite correct at saying that the ratio 
_should_ be 1:10.
OTOH you are the one who is wrong about me (again). :-(

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

> and note that even on the old scheduler, nice-0 was "3200% more 
> powerful" than nice +19 (with CONFIG_HZ=300),

How did you get that value? At any HZ the ratio should be around 1:10
(+- rounding error).

> in fact i like it that nice -20 has a slightly bigger punch than it used 
> to have before:

"Slightly bigger"??? You're joking, right?
Especially the user levels are doing something completely different now, 
which may break user expectation. While the user couldn't expect anything 
precise, it's still a big difference whether a process at nice 5 gets 75% 
of the time or only 30%.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Matt Mackall wrote:

> > It's nice that these artifacts are gone, but that still doesn't explain 
> > why this ratio had to be increase that much from around 1:10 to 1:69.
> 
> More dynamic range is better? If you actually want a task to get 20x
> the CPU time of another, the older scheduler doesn't really allow it.

You can already have that, the complete range level from 19 to -20 was 
about 1:80.
There is also something like too much range, I tried it with top at 19 and 
as soon as something runs at -20 it's practically dead, because it gets 
now only 1/5900 of cpu time.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Linus Torvalds wrote:

> How about trying a much less aggressive nice-level (and preferably linear, 
> not exponential)?

I think the exponential increase isn't the problem. The old code did 
approximate something like this rather crudely with the result that there 
was a big gap between level 0 and -1.

Something like this:

echo 'for (i=-20;i<=20;i++) print i, " : ", 1024*e(l(2)*(-i/20*3)), "\n";' | bc 
-l

would produce a range similiar to the old code. Replacing the factor 3 
with 4 would be IMO a more reasonable increase and had the advantage for 
the user that it's easier to understand that every 5 levels the time a 
process gets is doubled.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

> I explained it numerous times (remember the 'timeout' vs. 'timer event' 
> discussion?) that i consider timer granularity important to scalability. 
> Basically, in every case where we know with great certainty that a 
> time-out will _not_ occur (where the time-out is in essence just an 
> exception handling mechanism), using struct timer_list is the best 
> solution.

Whether the timer expires or not is in many cases completely irrelevant.
You need special cases, where the timer wheel behaviour becomes an issue 
and whether hrtimer would behave any better in such situations is 
questionable.
Again, for the average user such details are pretty much irrelevant.

> what i consider harmful on the other hand are all the HZ assumptions 
> embedded into various pieces of code. The most harmful ones are design 
> details that depend on HZ and kernel-internal API details that depends 
> on HZ. Yes, NTP was such an example, and it was hard to fix, and you 
> didnt help much with that.

Stop spreading lies! :-(
One only has to look at the history of kernel/time/ntp.c
John's rather simple "HZ free ntp" patch wouldn't have been that simple 
without all the cleanup patches before that done by me, which were 
precisely intended to make this possible.

> (perhaps that is one source of this 
> increasingly testy exchange ;-)

No, it's your prejudice against me based on wrong facts.
Get your facts straight and stop being an ass.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Jonathan Corbet wrote:

> > One possible problem here is that setting up that timer can be 
> > considerably more expensive, for a relative timer you have to read the 
> > current time, which can be quite expensive (e.g. your machine now uses the 
> > PIT timer, because TSC was deemed unstable).
> 
> That's a possibility, I admit I haven't benchmarked it.  I will say that
> I don't think it will be enough to matter - msleep() is not a hot-path
> sort of function.  Once the system is up and running it almost never
> gets called at all - at least, on my setup.

That's a bit my problem - we have to consider other setups as well.
Is it worth converting all msleep users behind their back or should we 
just a provide a separate function for those who care?
I would really like to keep hrtimer and kernel timer separate and make it 
obvious who is using what, as the usage requirements are somewhat 
different.

> > One question here would be, is it really a problem to sleep a little more?
> 
> "A little more" is a bit different than "twenty times as long as you
> asked for."  That "little bit more" added up to a few seconds when
> programming a device which needs a brief delay after tweaking each of
> almost 200 registers.

Which driver is this? I'd like to look at this, in case there's some other 
hidden problem. 

> > BTW there is another thing to consider. If you already run with hrtimer/ 
> > dyntick, there is not much reason to keep HZ at 100, so you could just 
> > increase HZ to get the same effect.
> 
> Except that then, with the current implementation, you're paying for the
> higher HZ whenever the CPU is busy.  I bet that doesn't take long to
> overwhelm any added overhead in the hrtimer msleep().

Actually if that's the case I'd consider this a bug, where is that extra 
cost coming from?

> In the end, I did this because I thought msleep() should do what it
> claims to do, because I thought that getting a known-to-expire timeout
> off the timer wheel made sense, and to make a tiny baby step in the
> direction of reducing the use of jiffies in the core code.

I know that Ingo considers everything HZ related evil, but it really is 
not - it keeps Linux scalable. Unless you need the high resolution the 
timer wheel performance is still pretty hard to beat. That 
"known-to-expire" stuff is really the least significant problem to 
consider here, please just forget about it.
I don't want to keep anyone from using hrtimer, if it's just some driver 
go wild, but in generic code we have to consider portability issues. Using 
jiffies as a time base is still unbeatable cheap in the general case, so 
we have to carefully consider whether using a different time source is 
required. There is nothing wrong with using jiffies if it fits the bill 
and in many cases it still does.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

> because when i assumed the obvious, you called it an 
> insult so please dont leave any room for assumptions and remove any 
> ambiguity - especially as our communication seems to be marred by what 
> appears to be frequent misunderstandings ;-)

What the hell is this supposed to be? How am I not to take this:

"i did not want to embarrass you (and distract the discussion) with
answering a pretty stupid, irrelevant question"

as an insult? How does it reflect on someone if he asks "stupid, 
irrelevant questions"? If it had been a misunderstanding, you could have 
ask appropriately instead of assuming I'm an idiot.
BTW Adding smileys to it doesn't make it any funnier, since it I don't 
believe in the "misunderstandings" theory anymore. :-(

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Sun, 15 Jul 2007, Jonathan Corbet wrote:

> The OLPC folks and I recently discovered something interesting: on a
> HZ=100 system, a call to msleep(1) will delay for about 20ms.  The
> combination of jiffies timekeeping and rounding up means that the
> minimum delay from msleep will be two jiffies, never less.  That led to
> multi-second delays in a driver which does a bunch of short msleep()
> calls and, in response, a change to mdelay(), which will come back in
> something closer to the requested time.
> 
> Here's another approach: a reimplementation of msleep() and
> msleep_interruptible() using hrtimers.  On a system without real
> hrtimers this code will at least drop down to single-jiffy delays much
> of the time (though not deterministically so).  On my x86_64 system with
> Thomas's hrtimer/dyntick patch applied, msleep(1) gives almost exactly
> what was asked for.

BTW there is another thing to consider. If you already run with hrtimer/ 
dyntick, there is not much reason to keep HZ at 100, so you could just 
increase HZ to get the same effect.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

> to sum it up: a nice +19 task (the most commonly used nice level in 
> practice) gets 9.1%, 3.9%, 3.1% of CPU time on the old scheduler, 
> depending on the value of HZ. This is quite inconsistent and illogical.

You're correct that you can find artifacts in the extreme cases, it's 
subjective whether this is a serious problem.
It's nice that these artifacts are gone, but that still doesn't explain 
why this ratio had to be increase that much from around 1:10 to 1:69.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

> > Well, you cut out the major question from my initial mail:
> > One question here would be, is it really a problem to sleep a little more?
> 
> oh, i did not want to embarrass you (and distract the discussion) with 
> answering a pretty stupid, irrelevant question that has the obvious 
> answer even for the most casual observer: "yes, of course it really is a 
> problem to sleep a little more, read the description of the fine patch 
> you are replying to" ...

And your insults continue... :-(
I ask a simple question and try to explore alternative solutions and this 
is your contribution to it?

To put this into a little more context, this is the complete text you cut 
off:

| One question here would be, is it really a problem to sleep a little more?
| Another possibility would be to add another sleep function, which uses 
| hrtimer and could also take ktime argument.

So instead of considering this suggestion, you just read what you want out 
of what I wrote and turn everything into an insult. Nicely done, Ingo. :-(

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

> > > > As soon as you add another loop the difference changes again, 
> > > > while it's always correct to say it gets 25% more cpu time [...]
> > > 
> > > yep, and i'll add the relative effect to the comment too.
> > 
> > Why did you cut off the rest of the sentence?
> 
> (no need to become hostile, i answered to that portion of your sentence 
> separately, which was logically detached from the other portion of your 
> sentence. I marked the cut with the '[...]' sign. )

Could you please stop with these accusations?
Could you please point me to the mail with the separate answer?

> > To illustrate the problem a little different: a task with a nice level 
> > -20 got around 700% more cpu time (or 8 times more), now it gets 8500% 
> > more cpu time (or 86.7 times more). You don't think that change to the 
> > nice levels is a little drastic?
> 
> This was discussed on lkml in detail, see the CFS threads.

Which are quite big, so I skipped most of it, a more precise pointer would 
be appreciated.

> It has been a 
> common request for nice levels to be more logical (i.e. to make them 
> universal and to detach them from HZ) and for them to be more effective 
> as well.

Huh? What has this to do with HZ? The scheduler used ticks internally, but 
it's irrelevant to what the user sees via the nice levels.
So the question still stands that this change may be a little drastic, as 
you changed the nice levels of _all_ users, not just of those who were 
previously interested in CFS.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

> i'm not sure how your question relates/connects to what i wrote above, 
> could you please re-phrase your question into a bit more verbose form so 
> that i can answer it? Thanks,

Well, you cut out the major question from my initial mail:
One question here would be, is it really a problem to sleep a little more?

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

> > As soon as you add another loop the difference changes again, while 
> > it's always correct to say it gets 25% more cpu time [...]
> 
> yep, and i'll add the relative effect to the comment too.

Why did you cut off the rest of the sentence?
To illustrate the problem a little different: a task with a nice level -20 
got around 700% more cpu time (or 8 times more), now it gets 8500% more 
cpu time (or 86.7 times more).
You don't think that change to the nice levels is a little drastic?

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

> i dont think there's any significant overhead. The OLPC folks are pretty 
> sensitive to performance,

How is a sleep function relevant to performace?

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Sun, 15 Jul 2007, Jonathan Corbet wrote:

> Here's another approach: a reimplementation of msleep() and
> msleep_interruptible() using hrtimers.  On a system without real
> hrtimers this code will at least drop down to single-jiffy delays much
> of the time (though not deterministically so).  On my x86_64 system with
> Thomas's hrtimer/dyntick patch applied, msleep(1) gives almost exactly
> what was asked for.

One possible problem here is that setting up that timer can be 
considerably more expensive, for a relative timer you have to read the 
current time, which can be quite expensive (e.g. your machine now uses the 
PIT timer, because TSC was deemed unstable).
One question here would be, is it really a problem to sleep a little more?
Another possibility would be to add another sleep function, which uses 
hrtimer and could also take ktime argument.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

> yes, the weight multiplier 1.25, but the actual difference in CPU 
> utilization, when running two CPU intense tasks, is ~10%:
> 
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  8246 mingo 20   0  1576  244  196 R   55  0.0   0:11.96 loop
>  8247 mingo 21   1  1576  244  196 R   45  0.0   0:10.52 loop
> 
> so the first task 'wins' +10% CPU utilization (relative to the 50% it 
> had before), the second task 'loses' -10% CPU utilization (relative to 
> the 50% it had before).

As soon as you add another loop the difference changes again, while it's 
always correct to say it gets 25% more cpu time (which I still think is a 
little too much).

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

 yes, the weight multiplier 1.25, but the actual difference in CPU 
 utilization, when running two CPU intense tasks, is ~10%:
 
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  8246 mingo 20   0  1576  244  196 R   55  0.0   0:11.96 loop
  8247 mingo 21   1  1576  244  196 R   45  0.0   0:10.52 loop
 
 so the first task 'wins' +10% CPU utilization (relative to the 50% it 
 had before), the second task 'loses' -10% CPU utilization (relative to 
 the 50% it had before).

As soon as you add another loop the difference changes again, while it's 
always correct to say it gets 25% more cpu time (which I still think is a 
little too much).

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Sun, 15 Jul 2007, Jonathan Corbet wrote:

 Here's another approach: a reimplementation of msleep() and
 msleep_interruptible() using hrtimers.  On a system without real
 hrtimers this code will at least drop down to single-jiffy delays much
 of the time (though not deterministically so).  On my x86_64 system with
 Thomas's hrtimer/dyntick patch applied, msleep(1) gives almost exactly
 what was asked for.

One possible problem here is that setting up that timer can be 
considerably more expensive, for a relative timer you have to read the 
current time, which can be quite expensive (e.g. your machine now uses the 
PIT timer, because TSC was deemed unstable).
One question here would be, is it really a problem to sleep a little more?
Another possibility would be to add another sleep function, which uses 
hrtimer and could also take ktime argument.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

 i dont think there's any significant overhead. The OLPC folks are pretty 
 sensitive to performance,

How is a sleep function relevant to performace?

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

  As soon as you add another loop the difference changes again, while 
  it's always correct to say it gets 25% more cpu time [...]
 
 yep, and i'll add the relative effect to the comment too.

Why did you cut off the rest of the sentence?
To illustrate the problem a little different: a task with a nice level -20 
got around 700% more cpu time (or 8 times more), now it gets 8500% more 
cpu time (or 86.7 times more).
You don't think that change to the nice levels is a little drastic?

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

 i'm not sure how your question relates/connects to what i wrote above, 
 could you please re-phrase your question into a bit more verbose form so 
 that i can answer it? Thanks,

Well, you cut out the major question from my initial mail:
One question here would be, is it really a problem to sleep a little more?

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

As soon as you add another loop the difference changes again, 
while it's always correct to say it gets 25% more cpu time [...]
   
   yep, and i'll add the relative effect to the comment too.
  
  Why did you cut off the rest of the sentence?
 
 (no need to become hostile, i answered to that portion of your sentence 
 separately, which was logically detached from the other portion of your 
 sentence. I marked the cut with the '[...]' sign. )

Could you please stop with these accusations?
Could you please point me to the mail with the separate answer?

  To illustrate the problem a little different: a task with a nice level 
  -20 got around 700% more cpu time (or 8 times more), now it gets 8500% 
  more cpu time (or 86.7 times more). You don't think that change to the 
  nice levels is a little drastic?
 
 This was discussed on lkml in detail, see the CFS threads.

Which are quite big, so I skipped most of it, a more precise pointer would 
be appreciated.

 It has been a 
 common request for nice levels to be more logical (i.e. to make them 
 universal and to detach them from HZ) and for them to be more effective 
 as well.

Huh? What has this to do with HZ? The scheduler used ticks internally, but 
it's irrelevant to what the user sees via the nice levels.
So the question still stands that this change may be a little drastic, as 
you changed the nice levels of _all_ users, not just of those who were 
previously interested in CFS.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

  Well, you cut out the major question from my initial mail:
  One question here would be, is it really a problem to sleep a little more?
 
 oh, i did not want to embarrass you (and distract the discussion) with 
 answering a pretty stupid, irrelevant question that has the obvious 
 answer even for the most casual observer: yes, of course it really is a 
 problem to sleep a little more, read the description of the fine patch 
 you are replying to ...

And your insults continue... :-(
I ask a simple question and try to explore alternative solutions and this 
is your contribution to it?

To put this into a little more context, this is the complete text you cut 
off:

| One question here would be, is it really a problem to sleep a little more?
| Another possibility would be to add another sleep function, which uses 
| hrtimer and could also take ktime argument.

So instead of considering this suggestion, you just read what you want out 
of what I wrote and turn everything into an insult. Nicely done, Ingo. :-(

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

 to sum it up: a nice +19 task (the most commonly used nice level in 
 practice) gets 9.1%, 3.9%, 3.1% of CPU time on the old scheduler, 
 depending on the value of HZ. This is quite inconsistent and illogical.

You're correct that you can find artifacts in the extreme cases, it's 
subjective whether this is a serious problem.
It's nice that these artifacts are gone, but that still doesn't explain 
why this ratio had to be increase that much from around 1:10 to 1:69.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Sun, 15 Jul 2007, Jonathan Corbet wrote:

 The OLPC folks and I recently discovered something interesting: on a
 HZ=100 system, a call to msleep(1) will delay for about 20ms.  The
 combination of jiffies timekeeping and rounding up means that the
 minimum delay from msleep will be two jiffies, never less.  That led to
 multi-second delays in a driver which does a bunch of short msleep()
 calls and, in response, a change to mdelay(), which will come back in
 something closer to the requested time.
 
 Here's another approach: a reimplementation of msleep() and
 msleep_interruptible() using hrtimers.  On a system without real
 hrtimers this code will at least drop down to single-jiffy delays much
 of the time (though not deterministically so).  On my x86_64 system with
 Thomas's hrtimer/dyntick patch applied, msleep(1) gives almost exactly
 what was asked for.

BTW there is another thing to consider. If you already run with hrtimer/ 
dyntick, there is not much reason to keep HZ at 100, so you could just 
increase HZ to get the same effect.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

 because when i assumed the obvious, you called it an 
 insult so please dont leave any room for assumptions and remove any 
 ambiguity - especially as our communication seems to be marred by what 
 appears to be frequent misunderstandings ;-)

What the hell is this supposed to be? How am I not to take this:

i did not want to embarrass you (and distract the discussion) with
answering a pretty stupid, irrelevant question

as an insult? How does it reflect on someone if he asks stupid, 
irrelevant questions? If it had been a misunderstanding, you could have 
ask appropriately instead of assuming I'm an idiot.
BTW Adding smileys to it doesn't make it any funnier, since it I don't 
believe in the misunderstandings theory anymore. :-(

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Jonathan Corbet wrote:

  One possible problem here is that setting up that timer can be 
  considerably more expensive, for a relative timer you have to read the 
  current time, which can be quite expensive (e.g. your machine now uses the 
  PIT timer, because TSC was deemed unstable).
 
 That's a possibility, I admit I haven't benchmarked it.  I will say that
 I don't think it will be enough to matter - msleep() is not a hot-path
 sort of function.  Once the system is up and running it almost never
 gets called at all - at least, on my setup.

That's a bit my problem - we have to consider other setups as well.
Is it worth converting all msleep users behind their back or should we 
just a provide a separate function for those who care?
I would really like to keep hrtimer and kernel timer separate and make it 
obvious who is using what, as the usage requirements are somewhat 
different.

  One question here would be, is it really a problem to sleep a little more?
 
 A little more is a bit different than twenty times as long as you
 asked for.  That little bit more added up to a few seconds when
 programming a device which needs a brief delay after tweaking each of
 almost 200 registers.

Which driver is this? I'd like to look at this, in case there's some other 
hidden problem. 

  BTW there is another thing to consider. If you already run with hrtimer/ 
  dyntick, there is not much reason to keep HZ at 100, so you could just 
  increase HZ to get the same effect.
 
 Except that then, with the current implementation, you're paying for the
 higher HZ whenever the CPU is busy.  I bet that doesn't take long to
 overwhelm any added overhead in the hrtimer msleep().

Actually if that's the case I'd consider this a bug, where is that extra 
cost coming from?

 In the end, I did this because I thought msleep() should do what it
 claims to do, because I thought that getting a known-to-expire timeout
 off the timer wheel made sense, and to make a tiny baby step in the
 direction of reducing the use of jiffies in the core code.

I know that Ingo considers everything HZ related evil, but it really is 
not - it keeps Linux scalable. Unless you need the high resolution the 
timer wheel performance is still pretty hard to beat. That 
known-to-expire stuff is really the least significant problem to 
consider here, please just forget about it.
I don't want to keep anyone from using hrtimer, if it's just some driver 
go wild, but in generic code we have to consider portability issues. Using 
jiffies as a time base is still unbeatable cheap in the general case, so 
we have to carefully consider whether using a different time source is 
required. There is nothing wrong with using jiffies if it fits the bill 
and in many cases it still does.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] msleep() with hrtimers

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

 I explained it numerous times (remember the 'timeout' vs. 'timer event' 
 discussion?) that i consider timer granularity important to scalability. 
 Basically, in every case where we know with great certainty that a 
 time-out will _not_ occur (where the time-out is in essence just an 
 exception handling mechanism), using struct timer_list is the best 
 solution.

Whether the timer expires or not is in many cases completely irrelevant.
You need special cases, where the timer wheel behaviour becomes an issue 
and whether hrtimer would behave any better in such situations is 
questionable.
Again, for the average user such details are pretty much irrelevant.

 what i consider harmful on the other hand are all the HZ assumptions 
 embedded into various pieces of code. The most harmful ones are design 
 details that depend on HZ and kernel-internal API details that depends 
 on HZ. Yes, NTP was such an example, and it was hard to fix, and you 
 didnt help much with that.

Stop spreading lies! :-(
One only has to look at the history of kernel/time/ntp.c
John's rather simple HZ free ntp patch wouldn't have been that simple 
without all the cleanup patches before that done by me, which were 
precisely intended to make this possible.

 (perhaps that is one source of this 
 increasingly testy exchange ;-)

No, it's your prejudice against me based on wrong facts.
Get your facts straight and stop being an ass.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Linus Torvalds wrote:

 How about trying a much less aggressive nice-level (and preferably linear, 
 not exponential)?

I think the exponential increase isn't the problem. The old code did 
approximate something like this rather crudely with the result that there 
was a big gap between level 0 and -1.

Something like this:

echo 'for (i=-20;i=20;i++) print i,  : , 1024*e(l(2)*(-i/20*3)), \n;' | bc 
-l

would produce a range similiar to the old code. Replacing the factor 3 
with 4 would be IMO a more reasonable increase and had the advantage for 
the user that it's easier to understand that every 5 levels the time a 
process gets is doubled.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Matt Mackall wrote:

  It's nice that these artifacts are gone, but that still doesn't explain 
  why this ratio had to be increase that much from around 1:10 to 1:69.
 
 More dynamic range is better? If you actually want a task to get 20x
 the CPU time of another, the older scheduler doesn't really allow it.

You can already have that, the complete range level from 19 to -20 was 
about 1:80.
There is also something like too much range, I tried it with top at 19 and 
as soon as something runs at -20 it's practically dead, because it gets 
now only 1/5900 of cpu time.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

 and note that even on the old scheduler, nice-0 was 3200% more 
 powerful than nice +19 (with CONFIG_HZ=300),

How did you get that value? At any HZ the ratio should be around 1:10
(+- rounding error).

 in fact i like it that nice -20 has a slightly bigger punch than it used 
 to have before:

Slightly bigger??? You're joking, right?
Especially the user levels are doing something completely different now, 
which may break user expectation. While the user couldn't expect anything 
precise, it's still a big difference whether a process at nice 5 gets 75% 
of the time or only 30%.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Tue, 17 Jul 2007, Ingo Molnar wrote:

 * Roman Zippel [EMAIL PROTECTED] wrote:
 
  Hi,
  
  On Mon, 16 Jul 2007, Ingo Molnar wrote:
  
   and note that even on the old scheduler, nice-0 was 3200% more 
   powerful than nice +19 (with CONFIG_HZ=300),
  
  How did you get that value? At any HZ the ratio should be around 1:10 
  (+- rounding error).
 
 you are wrong again. I sent you the numbers earlier today already:
 
 |   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 |  2332 mingo 25   0  1580  248  196 R 95.1  0.0   0:11.84 loop
 |  2335 mingo 39  19  1576  244  196 R  3.1  0.0   0:00.39 loop
 
 3.1% is 3067% more than 95.1%, and the ratio is 1:30.67. You again deny 
 above that this is the case, and there's nothing i can do about your 
 denial of facts - that is your own private problem.

Ingo, how am I supposed to react to this? I'm asking a simple question
and I get this? I'm at serious loss how to deal with you. :-(

Above is based on theoritical values, for a 300HZ kernel these two 
processes should get 30 and 3 ticks. Should there be any rounding error or 
off by one error so that the processes get one tick less than they should 
get or one tick is accounted to the wrong process, my theoritical value is 
still within the possible error range and doesn't contradict your
practical values.
Playing around with some other nice levels, confirms the theory that 
something is a little off, so I'm quite correct at saying that the ratio 
_should_ be 1:10.
OTOH you are the one who is wrong about me (again). :-(

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CFS: Fix missing digit off in wmult table

2007-07-16 Thread Roman Zippel
Hi,

On Tue, 17 Jul 2007, I wrote:

 Playing around with some other nice levels, confirms the theory that 
 something is a little off, so I'm quite correct at saying that the ratio 
 _should_ be 1:10.

Rechecking everything there was actually a small error in my test program, 
so the ratio should be at 1:20. Sorry about that mistake.
Nice level 19 shows the largest artifacts, as that level only gets a 
single tick, so the ratio is often 1:HZ/10 (except for 1000HZ where it's 
5:100). Nevertheless it's still true that in general nice levels were 
independent of HZ (that's all I wanted to say a couple of mails ago).

Ingo, you can start now gloating, but contrary to you I have no problems 
with admitting mistakes and apologizing for them. The point is just that 
I'm reacting better to factual arguments instead of flames (and I think 
it's not just me), so I'm pretty sure I'm still correct about this:

 OTOH you are the one who is wrong about me (again). :-(

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86 status was Re: -mm merge plans for 2.6.23

2007-07-13 Thread Roman Zippel
Hi,

On Fri, 13 Jul 2007, Mike Galbraith wrote:

> > The new scheduler does _a_lot_ of heavy 64 bit calculations without any 
> > attempt to scale that down a little...
> 
> See prio_to_weight[], prio_to_wmult[] and sysctl_sched_stat_granularity.
> Perhaps more can be done, but "without any attempt..." isn't accurate.

Calculating these values at runtime would have been completely insane, the 
alternative would be a crummy approximation, so using a lookup table is 
actually a good thing. That's not the problem.
BTW could someone please verify the prio_to_wmult table, especially [16] 
and [21] look a little off, like a digit was cut off.

While I'm at this, the 10% scaling there looks a little much (unless there 
are other changes I haven't looked at yet), the old code used more like 
5%. This would mean a prio -20 task would get 98.86% cpu time compared to 
a prio 0 task, that was previously about the difference between -20 and 
19 (and it would have previously gotten only 88.89%), now a prio -20 task 
would get 99.98% cpu time compared to a prio 19 task.
The individual levels are unfortunately not that easily comparable, but at 
the overall scale the change looks IMHO a little drastic.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86 status was Re: -mm merge plans for 2.6.23

2007-07-13 Thread Roman Zippel
Hi,

On Fri, 13 Jul 2007, Mike Galbraith wrote:

  The new scheduler does _a_lot_ of heavy 64 bit calculations without any 
  attempt to scale that down a little...
 
 See prio_to_weight[], prio_to_wmult[] and sysctl_sched_stat_granularity.
 Perhaps more can be done, but without any attempt... isn't accurate.

Calculating these values at runtime would have been completely insane, the 
alternative would be a crummy approximation, so using a lookup table is 
actually a good thing. That's not the problem.
BTW could someone please verify the prio_to_wmult table, especially [16] 
and [21] look a little off, like a digit was cut off.

While I'm at this, the 10% scaling there looks a little much (unless there 
are other changes I haven't looked at yet), the old code used more like 
5%. This would mean a prio -20 task would get 98.86% cpu time compared to 
a prio 0 task, that was previously about the difference between -20 and 
19 (and it would have previously gotten only 88.89%), now a prio -20 task 
would get 99.98% cpu time compared to a prio 19 task.
The individual levels are unfortunately not that easily comparable, but at 
the overall scale the change looks IMHO a little drastic.

bye, Roman
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86 status was Re: -mm merge plans for 2.6.23

2007-07-12 Thread Roman Zippel
Hi,

On Wed, 11 Jul 2007, Linus Torvalds wrote:

> Sure, bugs happen, but code that everybody runs the same generally doesn't 
> break. So a CPU scheduler doesn't worry me all that much. CPU schedulers 
> are "easy".

A little more advance warning wouldn't have hurt though.
The new scheduler does _a_lot_ of heavy 64 bit calculations without any 
attempt to scale that down a little...
One can blame me now for not having it brought up earlier, but discussions 
with Ingo are not something I'm looking forward to. :(

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Start to genericize kconfig for use by other projects.

2007-07-12 Thread Roman Zippel
Hi,

On Thu, 12 Jul 2007, I wrote:

> On Wed, 11 Jul 2007, Rob Landley wrote:
> 
> > Replace name "Linux Kernel" in menuconfig with a macro (defaulting to "Linux
> > Kernel" if not -Ddefined by the makefile), and remove a few unnecessary
> > occurrences of "kernel" in pop-up text.
> 
> Could you drop the PROJECT_NAME changes for now?

Or at least replace it with a variable at first.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Start to genericize kconfig for use by other projects.

2007-07-12 Thread Roman Zippel
Hi,

On Wed, 11 Jul 2007, Rob Landley wrote:

> Replace name "Linux Kernel" in menuconfig with a macro (defaulting to "Linux
> Kernel" if not -Ddefined by the makefile), and remove a few unnecessary
> occurrences of "kernel" in pop-up text.

Could you drop the PROJECT_NAME changes for now? The rest looks fine.
I would prefer if the project would be settable via Kconfig.
If you want to play with it add this to Kconfig:

config PROJECT_NAME
string
default "Linux kernel"

and at the end of conf_parse() you can lookup, calculate and cache the 
value.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   10   >