Re: [patch] block: fix blktrace timestamps

2008-01-14 Thread Jens Axboe
On Mon, Jan 14 2008, Christoph Hellwig wrote:
> 
> Folks, this is getting a little silly.

It is

> Even if CONFIG_NO_HZ is new this is a an important regression, and
> yes we should avoid regressions wherever we can, and for such a quite
> important feature we should fix it.  On the other hand blktrace is using
> the wrong interface, and it has been told multiple times on lkml that
> sched_clock() should never ever be used outside the scheduler.  So the
> burden to fix this regression lies on the shoulders of the blktrace
> maintainer.

That is true. So far alternatives have all been slower though, so not
very tempting to transition to.

> No need for silly name calling here.

I don't know what thread you are reading, but neiter Ingo nor I have
resorted to silly name calling. The thread has long since diverted from
the original topic, though.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-14 Thread Christoph Hellwig

Folks, this is getting a little silly.

Even if CONFIG_NO_HZ is new this is a an important regression, and
yes we should avoid regressions wherever we can, and for such a quite
important feature we should fix it.  On the other hand blktrace is using
the wrong interface, and it has been told multiple times on lkml that
sched_clock() should never ever be used outside the scheduler.  So the
burden to fix this regression lies on the shoulders of the blktrace
maintainer.

No need for silly name calling here.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-14 Thread Jens Axboe
On Mon, Jan 14 2008, Ingo Molnar wrote:
> 
> * Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
> > because a perfectly working system is:
> > 
> >  "a user's .config that worked before should work with the new kernel
> >   too"
> > 
> > not:
> > 
> >  "a user's .config that worked before should work now too, with random
> >   new kernel features enabled as well."
> > 
> > the latter appears to be the rule you are applying, but it's not the 
> > regression rule we are using.
> 
> Jens, just to bring your definition of regressions to its logical 
> conclusion: does this mean that if there is any longstanding bug in the 
> block layer that you know about, but i didnt ever utilize that bit of 
> the block layer it in my .config, and if i enable it now in the .config 
> and i experience that bug, does it suddenly count as a regression? Do 
> you realize that your definition for "regressions" turns _almost every_ 
> current bug in the kernel into a regression?

Ingo, why do you keep harping this issue? I thought I suggested we agree
to disagree on this and let it rest.

And I would say that, yes, that is a regression, if that config option
is a core option that people are likely to enable. The CONFIG_NO_HZ is a
new option, people will select it. Your example pertain more to the 'use
mmio for IO operations' type options for drivers. If you enable that and
your driver suddenly stops working, you have a clear idea of WHY it
stops working and how to fix it. Not so with this blktrace scenario, I
bet that would take people quite a while to figure out how it broke.

Can we drop this subject now, please? The issue is resolved (and
merged), debating definitions of regressions is not very productive :-)

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-14 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> because a perfectly working system is:
> 
>  "a user's .config that worked before should work with the new kernel
>   too"
> 
> not:
> 
>  "a user's .config that worked before should work now too, with random
>   new kernel features enabled as well."
> 
> the latter appears to be the rule you are applying, but it's not the 
> regression rule we are using.

Jens, just to bring your definition of regressions to its logical 
conclusion: does this mean that if there is any longstanding bug in the 
block layer that you know about, but i didnt ever utilize that bit of 
the block layer it in my .config, and if i enable it now in the .config 
and i experience that bug, does it suddenly count as a regression? Do 
you realize that your definition for "regressions" turns _almost every_ 
current bug in the kernel into a regression?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-14 Thread Jens Axboe
On Mon, Jan 14 2008, Ingo Molnar wrote:
 
 * Ingo Molnar [EMAIL PROTECTED] wrote:
 
  because a perfectly working system is:
  
   a user's .config that worked before should work with the new kernel
too
  
  not:
  
   a user's .config that worked before should work now too, with random
new kernel features enabled as well.
  
  the latter appears to be the rule you are applying, but it's not the 
  regression rule we are using.
 
 Jens, just to bring your definition of regressions to its logical 
 conclusion: does this mean that if there is any longstanding bug in the 
 block layer that you know about, but i didnt ever utilize that bit of 
 the block layer it in my .config, and if i enable it now in the .config 
 and i experience that bug, does it suddenly count as a regression? Do 
 you realize that your definition for regressions turns _almost every_ 
 current bug in the kernel into a regression?

Ingo, why do you keep harping this issue? I thought I suggested we agree
to disagree on this and let it rest.

And I would say that, yes, that is a regression, if that config option
is a core option that people are likely to enable. The CONFIG_NO_HZ is a
new option, people will select it. Your example pertain more to the 'use
mmio for IO operations' type options for drivers. If you enable that and
your driver suddenly stops working, you have a clear idea of WHY it
stops working and how to fix it. Not so with this blktrace scenario, I
bet that would take people quite a while to figure out how it broke.

Can we drop this subject now, please? The issue is resolved (and
merged), debating definitions of regressions is not very productive :-)

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-14 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

 because a perfectly working system is:
 
  a user's .config that worked before should work with the new kernel
   too
 
 not:
 
  a user's .config that worked before should work now too, with random
   new kernel features enabled as well.
 
 the latter appears to be the rule you are applying, but it's not the 
 regression rule we are using.

Jens, just to bring your definition of regressions to its logical 
conclusion: does this mean that if there is any longstanding bug in the 
block layer that you know about, but i didnt ever utilize that bit of 
the block layer it in my .config, and if i enable it now in the .config 
and i experience that bug, does it suddenly count as a regression? Do 
you realize that your definition for regressions turns _almost every_ 
current bug in the kernel into a regression?

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-14 Thread Christoph Hellwig

Folks, this is getting a little silly.

Even if CONFIG_NO_HZ is new this is a an important regression, and
yes we should avoid regressions wherever we can, and for such a quite
important feature we should fix it.  On the other hand blktrace is using
the wrong interface, and it has been told multiple times on lkml that
sched_clock() should never ever be used outside the scheduler.  So the
burden to fix this regression lies on the shoulders of the blktrace
maintainer.

No need for silly name calling here.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-13 Thread Ingo Molnar

* Jens Axboe <[EMAIL PROTECTED]> wrote:

> > - Fact: feature CONFIG_CPU_IDLE=y did not exist in 64-bit x86 in v2.6.23.
> >   It has known bugs but they are not flagged 'regressions' (because the 
> >   feature did not exist before and the option is default-disabled) hence 
> >   the feature can stay. Some fixes to it are too dangerous or too 
> >   unproven and are delayed to v2.6.25.
> 
> > 
> > and now apply the same rule to CONFIG_NO_HZ:
> > 
> > - Fact: feature CONFIG_NO_HZ=y did not exist in 64-bit x86 in v2.6.23.
> >   It has known bugs but they are not flagged 'regressions' (because the 
> >   feature did not exist before and the option is default-disabled) hence 
> >   the feature can stay. Some fixes to it are too dangerous or too 
> >   unproven and are delayed to v2.6.25.
> 
> Forget about this unrelated feature, it has nothing to do with this 
> situation. [...]

why "forget" about a perfectly valid example right from v2.6.24 that 
demonstrates our definition for regressions and demonstrates the rules 
we are applying? Because it does not fit your argument? :)

> [...] What if some (eg) sata driver broke because someone enabled 
> CONFIG_NO_HZ. Would you say that's not a regression because it worked 
> before that option was there? [...]

as long as it's the same situation as have here (i.e. that NO_HZ 
appeared on an architecture where it wasnt offered before): yes, 
exactly, it's _NOT_ a regression in that sata driver [the sata driver 
might not even have been touched], and it's _NOT_ a regression in 
CONFIG_NO_HZ either because a regression can only be something that 
"something breaks that worked before". In that case it's a _bug_ in the 
newly introduced NO_HZ code.

> [...] That's crap, no user would buy that argument. As far as the user 
> is concerned, it IS a regression.

the user enabled a newly introduced kernel feature and that new feature 
might be not perfect. It might break a whole lot of unrelated code. We 
consider them bugs (of course), and work on fixing them, but still we 
dont consider such bugs a regression and dont revert a new feature if 
it's not fixed. It's a special-case. We apply that special case to all 
sorts of new kernel features, such as the new iwl3945 driver in 2.6.24, 
which has a number of unfixed bugs. We applied that to new kernel 
features as well, such as containers/control-groups in 2.6.23, etc. This 
is nothing new, we did this numerous times ever since the new-style 
development model was introduced around 2.6.12. _You_ might not know 
about it but that does not change the situation.

because a perfectly working system is:

 "a user's .config that worked before should work with the new kernel 
  too"

not:

 "a user's .config that worked before should work now too, with random
  new kernel features enabled as well."

the latter appears to be the rule you are applying, but it's not the 
regression rule we are using.

> Since we obviously disagree on this, lets agree to not debate it any 
> further. It doesn't matter anyway, since the issue is resolved.

this is not a matter of opinion or "disagreement", this is a matter of 
fact. You were the one telling me: "go fix that regression", so this 
issue might come up anytime in the future. Better clear it once and for 
all, we all have got better things to do than to rehash the same issue 
again and again.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-13 Thread Ingo Molnar

* Jens Axboe [EMAIL PROTECTED] wrote:

  - Fact: feature CONFIG_CPU_IDLE=y did not exist in 64-bit x86 in v2.6.23.
It has known bugs but they are not flagged 'regressions' (because the 
feature did not exist before and the option is default-disabled) hence 
the feature can stay. Some fixes to it are too dangerous or too 
unproven and are delayed to v2.6.25.
 
  
  and now apply the same rule to CONFIG_NO_HZ:
  
  - Fact: feature CONFIG_NO_HZ=y did not exist in 64-bit x86 in v2.6.23.
It has known bugs but they are not flagged 'regressions' (because the 
feature did not exist before and the option is default-disabled) hence 
the feature can stay. Some fixes to it are too dangerous or too 
unproven and are delayed to v2.6.25.
 
 Forget about this unrelated feature, it has nothing to do with this 
 situation. [...]

why forget about a perfectly valid example right from v2.6.24 that 
demonstrates our definition for regressions and demonstrates the rules 
we are applying? Because it does not fit your argument? :)

 [...] What if some (eg) sata driver broke because someone enabled 
 CONFIG_NO_HZ. Would you say that's not a regression because it worked 
 before that option was there? [...]

as long as it's the same situation as have here (i.e. that NO_HZ 
appeared on an architecture where it wasnt offered before): yes, 
exactly, it's _NOT_ a regression in that sata driver [the sata driver 
might not even have been touched], and it's _NOT_ a regression in 
CONFIG_NO_HZ either because a regression can only be something that 
something breaks that worked before. In that case it's a _bug_ in the 
newly introduced NO_HZ code.

 [...] That's crap, no user would buy that argument. As far as the user 
 is concerned, it IS a regression.

the user enabled a newly introduced kernel feature and that new feature 
might be not perfect. It might break a whole lot of unrelated code. We 
consider them bugs (of course), and work on fixing them, but still we 
dont consider such bugs a regression and dont revert a new feature if 
it's not fixed. It's a special-case. We apply that special case to all 
sorts of new kernel features, such as the new iwl3945 driver in 2.6.24, 
which has a number of unfixed bugs. We applied that to new kernel 
features as well, such as containers/control-groups in 2.6.23, etc. This 
is nothing new, we did this numerous times ever since the new-style 
development model was introduced around 2.6.12. _You_ might not know 
about it but that does not change the situation.

because a perfectly working system is:

 a user's .config that worked before should work with the new kernel 
  too

not:

 a user's .config that worked before should work now too, with random
  new kernel features enabled as well.

the latter appears to be the rule you are applying, but it's not the 
regression rule we are using.

 Since we obviously disagree on this, lets agree to not debate it any 
 further. It doesn't matter anyway, since the issue is resolved.

this is not a matter of opinion or disagreement, this is a matter of 
fact. You were the one telling me: go fix that regression, so this 
issue might come up anytime in the future. Better clear it once and for 
all, we all have got better things to do than to rehash the same issue 
again and again.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Jens Axboe
On Fri, Jan 11 2008, Ingo Molnar wrote:
> 
> * David Dillow <[EMAIL PROTECTED]> wrote:
> 
> > On Fri, 2008-01-11 at 11:29 +0100, Ingo Molnar wrote:
> > > (David, could you try the patch further below - does it fix bkltrace 
> > > timestamps too?)
> > 
> > Jens already got back to you, but I can confirm it works here as well.
> 
> great, thanks for testing it. (i'm not sure Jens saw the same issues as 
> you, so your test was useful too.)

Right, I merely checked that ktime_get() worked for me.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Jens Axboe
On Fri, Jan 11 2008, Ingo Molnar wrote:
> 
> * Jens Axboe <[EMAIL PROTECTED]> wrote:
> 
> > > > If blktrace worked in 2.6.23 and it doesn't in 2.6.24 because of 
> > > > some option that isn't immediately apparent, then it's a 
> > > > regression. Period.
> > > 
> > > not completely correct. CONFIG_NO_HZ is a default-disabled option 
> > > that became newly available on 64-bit x86. So if NO_HZ does not 
> > > completely work on 64-bit, and if 32-bit works fine - which we dont 
> > > know yet (my guess would be that it's similarly broken on the same 
> > > box) then it's not a regression.
> > 
> > Ingo, it doesn't matter if the option is disabled by default or not! 
> > The fact is that functionality foo works in 2.6.23 and doesn't in 
> > 2.6.24 because of something unrelated. And that IS a regression, no 
> > matter what kind of word play you are doing here :-)
> 
> argh, Jens. Really. I believe you stopped using your brain because 
> CONFIG_BKL_IO_TRACE=y is affected by this bug and apparently you've got 
> a weak spot for it ;)

I don't think so.

> Think about it another way then, in the context of another, real kernel 
> feature we introduced in v2.6.24, namely CONFIG_CPU_IDLE=y:
> 
> - Fact: feature CONFIG_CPU_IDLE=y did not exist in 64-bit x86 in v2.6.23.
>   It has known bugs but they are not flagged 'regressions' (because the 
>   feature did not exist before and the option is default-disabled) hence 
>   the feature can stay. Some fixes to it are too dangerous or too 
>   unproven and are delayed to v2.6.25.

> 
> and now apply the same rule to CONFIG_NO_HZ:
> 
> - Fact: feature CONFIG_NO_HZ=y did not exist in 64-bit x86 in v2.6.23.
>   It has known bugs but they are not flagged 'regressions' (because the 
>   feature did not exist before and the option is default-disabled) hence 
>   the feature can stay. Some fixes to it are too dangerous or too 
>   unproven and are delayed to v2.6.25.

Forget about this unrelated feature, it has nothing to do with this
situation. What if some (eg) sata driver broke because someone enabled
CONFIG_NO_HZ. Would you say that's not a regression because it worked
before that option was there? That's crap, no user would buy that
argument. As far as the user is concerned, it IS a regression.

Since we obviously disagree on this, lets agree to not debate it any
further. It doesn't matter anyway, since the issue is resolved.

> > [...] What is the cost of ktime_get() compared to sched_clock()?
> 
> compared to the costs of IO transfers it should be OK. It can be really 
> fast but in the worst-case it can be as slow as 3-6 usecs (when we use 
> the acpi_pm clocksource). If there's complaints then probably the 
> networking folks will yell first :)

How does that compare to sched_clock()? 3-6 usec would render blktrace
unusable for certain types of testing, unfortunately. This isn't just
queried per IO, it's done for every action for that IO. So you'd
typically have up to eg 10 traces for a single IO piece, perhaps even
more.

For 2.6.24 the solution is fine, I'll revisit the impact when I can.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Ingo Molnar

* David Dillow <[EMAIL PROTECTED]> wrote:

> On Fri, 2008-01-11 at 11:29 +0100, Ingo Molnar wrote:
> > (David, could you try the patch further below - does it fix bkltrace 
> > timestamps too?)
> 
> Jens already got back to you, but I can confirm it works here as well.

great, thanks for testing it. (i'm not sure Jens saw the same issues as 
you, so your test was useful too.)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread David Dillow

On Fri, 2008-01-11 at 11:29 +0100, Ingo Molnar wrote:
> (David, could you try the patch further below - does it fix bkltrace 
> timestamps too?)

Jens already got back to you, but I can confirm it works here as well.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Ingo Molnar

* Jens Axboe <[EMAIL PROTECTED]> wrote:

> > > If blktrace worked in 2.6.23 and it doesn't in 2.6.24 because of 
> > > some option that isn't immediately apparent, then it's a 
> > > regression. Period.
> > 
> > not completely correct. CONFIG_NO_HZ is a default-disabled option 
> > that became newly available on 64-bit x86. So if NO_HZ does not 
> > completely work on 64-bit, and if 32-bit works fine - which we dont 
> > know yet (my guess would be that it's similarly broken on the same 
> > box) then it's not a regression.
> 
> Ingo, it doesn't matter if the option is disabled by default or not! 
> The fact is that functionality foo works in 2.6.23 and doesn't in 
> 2.6.24 because of something unrelated. And that IS a regression, no 
> matter what kind of word play you are doing here :-)

argh, Jens. Really. I believe you stopped using your brain because 
CONFIG_BKL_IO_TRACE=y is affected by this bug and apparently you've got 
a weak spot for it ;)

Think about it another way then, in the context of another, real kernel 
feature we introduced in v2.6.24, namely CONFIG_CPU_IDLE=y:

- Fact: feature CONFIG_CPU_IDLE=y did not exist in 64-bit x86 in v2.6.23.
  It has known bugs but they are not flagged 'regressions' (because the 
  feature did not exist before and the option is default-disabled) hence 
  the feature can stay. Some fixes to it are too dangerous or too 
  unproven and are delayed to v2.6.25.

and now apply the same rule to CONFIG_NO_HZ:

- Fact: feature CONFIG_NO_HZ=y did not exist in 64-bit x86 in v2.6.23.
  It has known bugs but they are not flagged 'regressions' (because the 
  feature did not exist before and the option is default-disabled) hence 
  the feature can stay. Some fixes to it are too dangerous or too 
  unproven and are delayed to v2.6.25.

ok?

Yes, it's bad that this happened, and perhaps it _is_ a regression, but 
not for the reason you claim. [ It might be a regression if 32-bit 
blktrace has the same problem under NO_HZ for example _AND_ bkltrace 
worked perfectly on the same box on v2.6.23. ]

Kernel regressions have a _strict_ definition: they mean "anything that 
worked before will work in the future too". Not: "anything that worked 
before will work in the future too if you enable random new non-default 
kernel features".

[ If the latter was the criterium we'd probably never see new features
  integrated! New stuff has bugs, because it's not nearly as well-tested 
  as older stuff. What matters is to 1) not turn it on by default if it 
  has known bugs 2) to always make positive progress on it, i.e.: to not
  regress new features in future kernel releases. This way, in the ideal 
  case, we'll have a monotonic series towards a perfect, bug-free kernel 
  ;) ]

> > ktime_get() should have been used instead, which is a proper GTOD 
> > clocksource. The patch below implements this.
> 
> Will give it a whirl, it looks promising indeed and gets rid of the 
> ugly cpu sync stuff. [...]

fantastic! :)

> [...] What is the cost of ktime_get() compared to sched_clock()?

compared to the costs of IO transfers it should be OK. It can be really 
fast but in the worst-case it can be as slow as 3-6 usecs (when we use 
the acpi_pm clocksource). If there's complaints then probably the 
networking folks will yell first :)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Jens Axboe
On Fri, Jan 11 2008, Jens Axboe wrote:
> > ktime_get() should have been used instead, which is a proper GTOD 
> > clocksource. The patch below implements this.
> 
> Will give it a whirl, it looks promising indeed and gets rid of the ugly
> cpu sync stuff. What is the cost of ktime_get() compared to
> sched_clock()?

Works for me - will apply it right now, we can always evaluate any
potential performance impact later. It's more important to have a
functional blktrace in 2.6.24.

Thanks Ingo!

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Jens Axboe
On Fri, Jan 11 2008, Ingo Molnar wrote:
> 
> (David, could you try the patch further below - does it fix bkltrace 
> timestamps too?)
> 
> * Jens Axboe <[EMAIL PROTECTED]> wrote:
> 
> > On Fri, Jan 11 2008, Ingo Molnar wrote:
> > > 
> > > * Jens Axboe <[EMAIL PROTECTED]> wrote:
> > > 
> > > > > they are from the scheduler git tree (except the first debug patch), 
> > > > > but queued up for v2.6.25 at the moment.
> > > > 
> > > > So this means that blktrace will be broken with CONFIG_NO_HZ for 
> > > > 2.6.24? That's clearly a regression.
> > > 
> > > 64-bit CONFIG_NO_HZ is a new feature in v2.6.24. If it happens on 
> > > 32-bit too and it didnt happen in v2.6.23 32-bit then it's a 
> > > regression.
> > 
> > If blktrace worked in 2.6.23 and it doesn't in 2.6.24 because of some 
> > option that isn't immediately apparent, then it's a regression. 
> > Period.
> 
> not completely correct. CONFIG_NO_HZ is a default-disabled option that 
> became newly available on 64-bit x86. So if NO_HZ does not completely 
> work on 64-bit, and if 32-bit works fine - which we dont know yet (my 
> guess would be that it's similarly broken on the same box) then it's not 
> a regression.

Ingo, it doesn't matter if the option is disabled by default or not!
The fact is that functionality foo works in 2.6.23 and doesn't in 2.6.24
because of something unrelated. And that IS a regression, no matter what
kind of word play you are doing here :-)

> But even if it's not "technically" a regression, it's something we want 
> to fix in .24 if we can, so i'm all with you Jens :)

That's good :)

> ktime_get() should have been used instead, which is a proper GTOD 
> clocksource. The patch below implements this.

Will give it a whirl, it looks promising indeed and gets rid of the ugly
cpu sync stuff. What is the cost of ktime_get() compared to
sched_clock()?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Ingo Molnar

* Guillaume Chazarain <[EMAIL PROTECTED]> wrote:

> > Correction: it was not a high res time source, it was "the 
> > scheduler's per-cpu, non-exported, non-coherent, 
> > warps-and-jumps-like-hell high-res timesource that was intentionally 
> > called the _sched_ clock" ;-)
> 
> I think the warts of cpu_clock() are fixable, except maybe 
> unsynchronization on SMP which is harder.

yes - but in terms of v2.6.24, the bkltrace patch i sent is a lot less 
risky.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Guillaume Chazarain
Ingo Molnar <[EMAIL PROTECTED]> wrote:

> Correction: it was not a high res time source, it was "the scheduler's 
> per-cpu, non-exported, non-coherent, warps-and-jumps-like-hell high-res 
> timesource that was intentionally called the _sched_ clock" ;-)

I think the warts of cpu_clock() are fixable, except maybe
unsynchronization on SMP which is harder.

-- 
Guillaume
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] block: fix blktrace timestamps

2008-01-11 Thread Ingo Molnar

(David, could you try the patch further below - does it fix bkltrace 
timestamps too?)

* Jens Axboe <[EMAIL PROTECTED]> wrote:

> On Fri, Jan 11 2008, Ingo Molnar wrote:
> > 
> > * Jens Axboe <[EMAIL PROTECTED]> wrote:
> > 
> > > > they are from the scheduler git tree (except the first debug patch), 
> > > > but queued up for v2.6.25 at the moment.
> > > 
> > > So this means that blktrace will be broken with CONFIG_NO_HZ for 
> > > 2.6.24? That's clearly a regression.
> > 
> > 64-bit CONFIG_NO_HZ is a new feature in v2.6.24. If it happens on 
> > 32-bit too and it didnt happen in v2.6.23 32-bit then it's a 
> > regression.
> 
> If blktrace worked in 2.6.23 and it doesn't in 2.6.24 because of some 
> option that isn't immediately apparent, then it's a regression. 
> Period.

not completely correct. CONFIG_NO_HZ is a default-disabled option that 
became newly available on 64-bit x86. So if NO_HZ does not completely 
work on 64-bit, and if 32-bit works fine - which we dont know yet (my 
guess would be that it's similarly broken on the same box) then it's not 
a regression.

But even if it's not "technically" a regression, it's something we want 
to fix in .24 if we can, so i'm all with you Jens :)

and you can see that by looking at the patch. It does not unbreak 
something that 2.6.24 broke. It makes something work better that used to 
be so broken for such a long time. (sched_clock() based blktrace)

anyway, the most dangerous one of these patches has become well-tested 
meanwhile in x86.git. So maybe we can get this fix into v2.6.24. Maybe 
not. We'll see.

> > all this comes from blktrace's original decision of using sched_clock()
> > :-) It's not a global timesource and it's not trivial to turn it into a
> > halfways usable global timesource.
> 
> Hey, it was a high res time source and the only one easily available 
> :) I'm fine with using another timesource, I'll take suggestions or 
> patches any day!

Correction: it was not a high res time source, it was "the scheduler's 
per-cpu, non-exported, non-coherent, warps-and-jumps-like-hell high-res 
timesource that was intentionally called the _sched_ clock" ;-)

ktime_get() should have been used instead, which is a proper GTOD 
clocksource. The patch below implements this.

Ingo

->
Subject: block: fix blktrace timestamps
From: Ingo Molnar <[EMAIL PROTECTED]>

David Dillow reported broken blktrace timestamps. The reason
is cpu_clock() which is not a global time source.

Fix bkltrace timestamps by using ktime_get() like the networking
code does for packet timestamps. This also removes a whole lot
of complexity from bkltrace.c and shrinks the code by 500 bytes:

   textdata bss dec hex filename
   2888 124  443056 bf0 blktrace.o.before
   2390 116  442550 9f6 blktrace.o.after

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 block/blktrace.c |   69 +--
 1 file changed, 2 insertions(+), 67 deletions(-)

Index: linux-x86.q/block/blktrace.c
===
--- linux-x86.q.orig/block/blktrace.c
+++ linux-x86.q/block/blktrace.c
@@ -25,7 +25,6 @@
 #include 
 #include 
 
-static DEFINE_PER_CPU(unsigned long long, blk_trace_cpu_offset) = { 0, };
 static unsigned int blktrace_seq __read_mostly = 1;
 
 /*
@@ -41,7 +40,7 @@ static void trace_note(struct blk_trace 
const int cpu = smp_processor_id();
 
t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
-   t->time = cpu_clock(cpu) - per_cpu(blk_trace_cpu_offset, cpu);
+   t->time = ktime_to_ns(ktime_get());
t->device = bt->dev;
t->action = action;
t->pid = pid;
@@ -159,7 +158,7 @@ void __blk_add_trace(struct blk_trace *b
 
t->magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
t->sequence = ++(*sequence);
-   t->time = cpu_clock(cpu) - per_cpu(blk_trace_cpu_offset, cpu);
+   t->time = ktime_to_ns(ktime_get());
t->sector = sector;
t->bytes = bytes;
t->action = what;
@@ -506,73 +505,9 @@ void blk_trace_shutdown(struct request_q
}
 }
 
-/*
- * Average offset over two calls to cpu_clock() with a gettimeofday()
- * in the middle
- */
-static void blk_check_time(unsigned long long *t, int this_cpu)
-{
-   unsigned long long a, b;
-   struct timeval tv;
-
-   a = cpu_clock(this_cpu);
-   do_gettimeofday();
-   b = cpu_clock(this_cpu);
-
-   *t = tv.tv_sec * 10 + tv.tv_usec * 1000;
-   *t -= (a + b) / 2;
-}
-
-/*
- * calibrate our inter-CPU timings
- */
-static void blk_trace_check_cpu_time(void *data)
-{
-   unsigned long long *t;
-   int this_cpu = get_cpu();
-
-   t = _cpu(blk_trace_cpu_offset, this_cpu);
-
-   /*
-* Just call it twice, hopefully the second call will be cache 

Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Guillaume Chazarain
Ingo Molnar [EMAIL PROTECTED] wrote:

 Correction: it was not a high res time source, it was the scheduler's 
 per-cpu, non-exported, non-coherent, warps-and-jumps-like-hell high-res 
 timesource that was intentionally called the _sched_ clock ;-)

I think the warts of cpu_clock() are fixable, except maybe
unsynchronization on SMP which is harder.

-- 
Guillaume
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Ingo Molnar

* Guillaume Chazarain [EMAIL PROTECTED] wrote:

  Correction: it was not a high res time source, it was the 
  scheduler's per-cpu, non-exported, non-coherent, 
  warps-and-jumps-like-hell high-res timesource that was intentionally 
  called the _sched_ clock ;-)
 
 I think the warts of cpu_clock() are fixable, except maybe 
 unsynchronization on SMP which is harder.

yes - but in terms of v2.6.24, the bkltrace patch i sent is a lot less 
risky.

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] block: fix blktrace timestamps

2008-01-11 Thread Ingo Molnar

(David, could you try the patch further below - does it fix bkltrace 
timestamps too?)

* Jens Axboe [EMAIL PROTECTED] wrote:

 On Fri, Jan 11 2008, Ingo Molnar wrote:
  
  * Jens Axboe [EMAIL PROTECTED] wrote:
  
they are from the scheduler git tree (except the first debug patch), 
but queued up for v2.6.25 at the moment.
   
   So this means that blktrace will be broken with CONFIG_NO_HZ for 
   2.6.24? That's clearly a regression.
  
  64-bit CONFIG_NO_HZ is a new feature in v2.6.24. If it happens on 
  32-bit too and it didnt happen in v2.6.23 32-bit then it's a 
  regression.
 
 If blktrace worked in 2.6.23 and it doesn't in 2.6.24 because of some 
 option that isn't immediately apparent, then it's a regression. 
 Period.

not completely correct. CONFIG_NO_HZ is a default-disabled option that 
became newly available on 64-bit x86. So if NO_HZ does not completely 
work on 64-bit, and if 32-bit works fine - which we dont know yet (my 
guess would be that it's similarly broken on the same box) then it's not 
a regression.

But even if it's not technically a regression, it's something we want 
to fix in .24 if we can, so i'm all with you Jens :)

and you can see that by looking at the patch. It does not unbreak 
something that 2.6.24 broke. It makes something work better that used to 
be so broken for such a long time. (sched_clock() based blktrace)

anyway, the most dangerous one of these patches has become well-tested 
meanwhile in x86.git. So maybe we can get this fix into v2.6.24. Maybe 
not. We'll see.

  all this comes from blktrace's original decision of using sched_clock()
  :-) It's not a global timesource and it's not trivial to turn it into a
  halfways usable global timesource.
 
 Hey, it was a high res time source and the only one easily available 
 :) I'm fine with using another timesource, I'll take suggestions or 
 patches any day!

Correction: it was not a high res time source, it was the scheduler's 
per-cpu, non-exported, non-coherent, warps-and-jumps-like-hell high-res 
timesource that was intentionally called the _sched_ clock ;-)

ktime_get() should have been used instead, which is a proper GTOD 
clocksource. The patch below implements this.

Ingo

-
Subject: block: fix blktrace timestamps
From: Ingo Molnar [EMAIL PROTECTED]

David Dillow reported broken blktrace timestamps. The reason
is cpu_clock() which is not a global time source.

Fix bkltrace timestamps by using ktime_get() like the networking
code does for packet timestamps. This also removes a whole lot
of complexity from bkltrace.c and shrinks the code by 500 bytes:

   textdata bss dec hex filename
   2888 124  443056 bf0 blktrace.o.before
   2390 116  442550 9f6 blktrace.o.after

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 block/blktrace.c |   69 +--
 1 file changed, 2 insertions(+), 67 deletions(-)

Index: linux-x86.q/block/blktrace.c
===
--- linux-x86.q.orig/block/blktrace.c
+++ linux-x86.q/block/blktrace.c
@@ -25,7 +25,6 @@
 #include linux/time.h
 #include asm/uaccess.h
 
-static DEFINE_PER_CPU(unsigned long long, blk_trace_cpu_offset) = { 0, };
 static unsigned int blktrace_seq __read_mostly = 1;
 
 /*
@@ -41,7 +40,7 @@ static void trace_note(struct blk_trace 
const int cpu = smp_processor_id();
 
t-magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
-   t-time = cpu_clock(cpu) - per_cpu(blk_trace_cpu_offset, cpu);
+   t-time = ktime_to_ns(ktime_get());
t-device = bt-dev;
t-action = action;
t-pid = pid;
@@ -159,7 +158,7 @@ void __blk_add_trace(struct blk_trace *b
 
t-magic = BLK_IO_TRACE_MAGIC | BLK_IO_TRACE_VERSION;
t-sequence = ++(*sequence);
-   t-time = cpu_clock(cpu) - per_cpu(blk_trace_cpu_offset, cpu);
+   t-time = ktime_to_ns(ktime_get());
t-sector = sector;
t-bytes = bytes;
t-action = what;
@@ -506,73 +505,9 @@ void blk_trace_shutdown(struct request_q
}
 }
 
-/*
- * Average offset over two calls to cpu_clock() with a gettimeofday()
- * in the middle
- */
-static void blk_check_time(unsigned long long *t, int this_cpu)
-{
-   unsigned long long a, b;
-   struct timeval tv;
-
-   a = cpu_clock(this_cpu);
-   do_gettimeofday(tv);
-   b = cpu_clock(this_cpu);
-
-   *t = tv.tv_sec * 10 + tv.tv_usec * 1000;
-   *t -= (a + b) / 2;
-}
-
-/*
- * calibrate our inter-CPU timings
- */
-static void blk_trace_check_cpu_time(void *data)
-{
-   unsigned long long *t;
-   int this_cpu = get_cpu();
-
-   t = per_cpu(blk_trace_cpu_offset, this_cpu);
-
-   /*
-* Just call it twice, hopefully the second call will be cache hot
-* and a little more precise
-  

Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Ingo Molnar

* Jens Axboe [EMAIL PROTECTED] wrote:

   If blktrace worked in 2.6.23 and it doesn't in 2.6.24 because of 
   some option that isn't immediately apparent, then it's a 
   regression. Period.
  
  not completely correct. CONFIG_NO_HZ is a default-disabled option 
  that became newly available on 64-bit x86. So if NO_HZ does not 
  completely work on 64-bit, and if 32-bit works fine - which we dont 
  know yet (my guess would be that it's similarly broken on the same 
  box) then it's not a regression.
 
 Ingo, it doesn't matter if the option is disabled by default or not! 
 The fact is that functionality foo works in 2.6.23 and doesn't in 
 2.6.24 because of something unrelated. And that IS a regression, no 
 matter what kind of word play you are doing here :-)

argh, Jens. Really. I believe you stopped using your brain because 
CONFIG_BKL_IO_TRACE=y is affected by this bug and apparently you've got 
a weak spot for it ;)

Think about it another way then, in the context of another, real kernel 
feature we introduced in v2.6.24, namely CONFIG_CPU_IDLE=y:

- Fact: feature CONFIG_CPU_IDLE=y did not exist in 64-bit x86 in v2.6.23.
  It has known bugs but they are not flagged 'regressions' (because the 
  feature did not exist before and the option is default-disabled) hence 
  the feature can stay. Some fixes to it are too dangerous or too 
  unproven and are delayed to v2.6.25.

and now apply the same rule to CONFIG_NO_HZ:

- Fact: feature CONFIG_NO_HZ=y did not exist in 64-bit x86 in v2.6.23.
  It has known bugs but they are not flagged 'regressions' (because the 
  feature did not exist before and the option is default-disabled) hence 
  the feature can stay. Some fixes to it are too dangerous or too 
  unproven and are delayed to v2.6.25.

ok?

Yes, it's bad that this happened, and perhaps it _is_ a regression, but 
not for the reason you claim. [ It might be a regression if 32-bit 
blktrace has the same problem under NO_HZ for example _AND_ bkltrace 
worked perfectly on the same box on v2.6.23. ]

Kernel regressions have a _strict_ definition: they mean anything that 
worked before will work in the future too. Not: anything that worked 
before will work in the future too if you enable random new non-default 
kernel features.

[ If the latter was the criterium we'd probably never see new features
  integrated! New stuff has bugs, because it's not nearly as well-tested 
  as older stuff. What matters is to 1) not turn it on by default if it 
  has known bugs 2) to always make positive progress on it, i.e.: to not
  regress new features in future kernel releases. This way, in the ideal 
  case, we'll have a monotonic series towards a perfect, bug-free kernel 
  ;) ]

  ktime_get() should have been used instead, which is a proper GTOD 
  clocksource. The patch below implements this.
 
 Will give it a whirl, it looks promising indeed and gets rid of the 
 ugly cpu sync stuff. [...]

fantastic! :)

 [...] What is the cost of ktime_get() compared to sched_clock()?

compared to the costs of IO transfers it should be OK. It can be really 
fast but in the worst-case it can be as slow as 3-6 usecs (when we use 
the acpi_pm clocksource). If there's complaints then probably the 
networking folks will yell first :)

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread David Dillow

On Fri, 2008-01-11 at 11:29 +0100, Ingo Molnar wrote:
 (David, could you try the patch further below - does it fix bkltrace 
 timestamps too?)

Jens already got back to you, but I can confirm it works here as well.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Jens Axboe
On Fri, Jan 11 2008, Ingo Molnar wrote:
 
 (David, could you try the patch further below - does it fix bkltrace 
 timestamps too?)
 
 * Jens Axboe [EMAIL PROTECTED] wrote:
 
  On Fri, Jan 11 2008, Ingo Molnar wrote:
   
   * Jens Axboe [EMAIL PROTECTED] wrote:
   
 they are from the scheduler git tree (except the first debug patch), 
 but queued up for v2.6.25 at the moment.

So this means that blktrace will be broken with CONFIG_NO_HZ for 
2.6.24? That's clearly a regression.
   
   64-bit CONFIG_NO_HZ is a new feature in v2.6.24. If it happens on 
   32-bit too and it didnt happen in v2.6.23 32-bit then it's a 
   regression.
  
  If blktrace worked in 2.6.23 and it doesn't in 2.6.24 because of some 
  option that isn't immediately apparent, then it's a regression. 
  Period.
 
 not completely correct. CONFIG_NO_HZ is a default-disabled option that 
 became newly available on 64-bit x86. So if NO_HZ does not completely 
 work on 64-bit, and if 32-bit works fine - which we dont know yet (my 
 guess would be that it's similarly broken on the same box) then it's not 
 a regression.

Ingo, it doesn't matter if the option is disabled by default or not!
The fact is that functionality foo works in 2.6.23 and doesn't in 2.6.24
because of something unrelated. And that IS a regression, no matter what
kind of word play you are doing here :-)

 But even if it's not technically a regression, it's something we want 
 to fix in .24 if we can, so i'm all with you Jens :)

That's good :)

 ktime_get() should have been used instead, which is a proper GTOD 
 clocksource. The patch below implements this.

Will give it a whirl, it looks promising indeed and gets rid of the ugly
cpu sync stuff. What is the cost of ktime_get() compared to
sched_clock()?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Jens Axboe
On Fri, Jan 11 2008, Jens Axboe wrote:
  ktime_get() should have been used instead, which is a proper GTOD 
  clocksource. The patch below implements this.
 
 Will give it a whirl, it looks promising indeed and gets rid of the ugly
 cpu sync stuff. What is the cost of ktime_get() compared to
 sched_clock()?

Works for me - will apply it right now, we can always evaluate any
potential performance impact later. It's more important to have a
functional blktrace in 2.6.24.

Thanks Ingo!

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Jens Axboe
On Fri, Jan 11 2008, Ingo Molnar wrote:
 
 * David Dillow [EMAIL PROTECTED] wrote:
 
  On Fri, 2008-01-11 at 11:29 +0100, Ingo Molnar wrote:
   (David, could you try the patch further below - does it fix bkltrace 
   timestamps too?)
  
  Jens already got back to you, but I can confirm it works here as well.
 
 great, thanks for testing it. (i'm not sure Jens saw the same issues as 
 you, so your test was useful too.)

Right, I merely checked that ktime_get() worked for me.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Jens Axboe
On Fri, Jan 11 2008, Ingo Molnar wrote:
 
 * Jens Axboe [EMAIL PROTECTED] wrote:
 
If blktrace worked in 2.6.23 and it doesn't in 2.6.24 because of 
some option that isn't immediately apparent, then it's a 
regression. Period.
   
   not completely correct. CONFIG_NO_HZ is a default-disabled option 
   that became newly available on 64-bit x86. So if NO_HZ does not 
   completely work on 64-bit, and if 32-bit works fine - which we dont 
   know yet (my guess would be that it's similarly broken on the same 
   box) then it's not a regression.
  
  Ingo, it doesn't matter if the option is disabled by default or not! 
  The fact is that functionality foo works in 2.6.23 and doesn't in 
  2.6.24 because of something unrelated. And that IS a regression, no 
  matter what kind of word play you are doing here :-)
 
 argh, Jens. Really. I believe you stopped using your brain because 
 CONFIG_BKL_IO_TRACE=y is affected by this bug and apparently you've got 
 a weak spot for it ;)

I don't think so.

 Think about it another way then, in the context of another, real kernel 
 feature we introduced in v2.6.24, namely CONFIG_CPU_IDLE=y:
 
 - Fact: feature CONFIG_CPU_IDLE=y did not exist in 64-bit x86 in v2.6.23.
   It has known bugs but they are not flagged 'regressions' (because the 
   feature did not exist before and the option is default-disabled) hence 
   the feature can stay. Some fixes to it are too dangerous or too 
   unproven and are delayed to v2.6.25.

 
 and now apply the same rule to CONFIG_NO_HZ:
 
 - Fact: feature CONFIG_NO_HZ=y did not exist in 64-bit x86 in v2.6.23.
   It has known bugs but they are not flagged 'regressions' (because the 
   feature did not exist before and the option is default-disabled) hence 
   the feature can stay. Some fixes to it are too dangerous or too 
   unproven and are delayed to v2.6.25.

Forget about this unrelated feature, it has nothing to do with this
situation. What if some (eg) sata driver broke because someone enabled
CONFIG_NO_HZ. Would you say that's not a regression because it worked
before that option was there? That's crap, no user would buy that
argument. As far as the user is concerned, it IS a regression.

Since we obviously disagree on this, lets agree to not debate it any
further. It doesn't matter anyway, since the issue is resolved.

  [...] What is the cost of ktime_get() compared to sched_clock()?
 
 compared to the costs of IO transfers it should be OK. It can be really 
 fast but in the worst-case it can be as slow as 3-6 usecs (when we use 
 the acpi_pm clocksource). If there's complaints then probably the 
 networking folks will yell first :)

How does that compare to sched_clock()? 3-6 usec would render blktrace
unusable for certain types of testing, unfortunately. This isn't just
queried per IO, it's done for every action for that IO. So you'd
typically have up to eg 10 traces for a single IO piece, perhaps even
more.

For 2.6.24 the solution is fine, I'll revisit the impact when I can.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] block: fix blktrace timestamps

2008-01-11 Thread Ingo Molnar

* David Dillow [EMAIL PROTECTED] wrote:

 On Fri, 2008-01-11 at 11:29 +0100, Ingo Molnar wrote:
  (David, could you try the patch further below - does it fix bkltrace 
  timestamps too?)
 
 Jens already got back to you, but I can confirm it works here as well.

great, thanks for testing it. (i'm not sure Jens saw the same issues as 
you, so your test was useful too.)

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/