Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Zachary Amsden
On Tue, 2007-10-30 at 10:02 -0200, Glauber de Oliveira Costa wrote:

> > No, if no paravirt clocksource is detected, nothing can override the
> > perfect TSC hardware clocksource rating of 400.  And if a paravirt
> > clocksource is detected, it is always better than TSC.
> 
> Why always? tsc is the best possible alternative the _platform_ can
> provide. So the paravirt clocksource will be at best, as good as tsc.

What is the justification for this claim?  The TSC provides no
information about frequency.  The frequency must be measured from a
known fixed rate source.  This measurement is error prone, both on real
hardware, and even more so in a VM.

The TSC can only provide a single measure of elapsed periods.  It has no
concept to differentiate between elapsed periods of time spent in the
guest, which is relevant to the scheduler, and elapsed periods of time
spent in real time, which is relevant to timekeeping.

So a TSC can not drive both timekeeping and scheduling in a virtual
machine, or if it does, it does so inaccurately.


> And if it is the case: why bother with communication mechanisms of all
> kinds, calculations, etc, if we can just read the tsc?
> 
> Noting again: If the tsc does not arrive accurate to the guest, it will
> fail the tsc clocksource tests. So it will be rated zero anyway

You always need to provide a TSC, and it will always initially appear to
be accurate.  Long term, as time is reported via the TSC is an
approximation somewhere between real time and elapsed running time,
measurements made using it in a VM will drift, and it is impossible to
remove the inaccuracy from one end of the spectrum (by providing real
time exactly) without compromising it at the other end (by skewing
relative time measurement).

A paravirt clocksource is always better than a TSC because it
compensates for these effects.

Zach

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Glauber de Oliveira Costa
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ingo Molnar escreveu:
> * Rusty Russell <[EMAIL PROTECTED]> wrote:
> 
> and just in case it's not obvious: i am not arguing for the inclusion of 
> the patch, i'm just pointing out the plain fact that in the case where 
> the TSC _is_ reliable, 5 different clocksource drivers for has obvious 
> disadvantages. Anyone arguing against that simple point needs his head 
> examined :) Once we can pass around calibration information from the 
> host to the guest (which we dont do at the moment) there will be reason 
> not to use the native clocksource driver in the guest.
If you sustain that we cannot have a reliable synchronization test
mechanism, neither do I. All this is based in the assumption that a bad
tsc will fail such tests.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHJyAHjYI8LaFUWXMRAtYoAJ9l5kNodRLfTsNHDvyioufM7SQzTACfarEy
XXq3sDq9uxZ/72hhA46YmgM=
=DwN/
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Glauber de Oliveira Costa
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Zachary Amsden escreveu:
> On Tue, 2007-10-30 at 00:02 +0100, Ingo Molnar wrote:
>> * Zachary Amsden <[EMAIL PROTECTED]> wrote:
> 
>>> Not every guest support paravirt, but for correctness, all guests 
>>> require TSC, which must be exposed all the way up to userspace, no 
>>> matter what the efficiency or accuracy may be.
>> but if there's a perfect TSC available (there is such hardware) then the 
>> TSC _is_ the best clocksource. Paravirt now turns it off unconditionally 
>> in essence.
> 
> No, if no paravirt clocksource is detected, nothing can override the
> perfect TSC hardware clocksource rating of 400.  And if a paravirt
> clocksource is detected, it is always better than TSC.

Why always? tsc is the best possible alternative the _platform_ can
provide. So the paravirt clocksource will be at best, as good as tsc.
And if it is the case: why bother with communication mechanisms of all
kinds, calculations, etc, if we can just read the tsc?

Noting again: If the tsc does not arrive accurate to the guest, it will
fail the tsc clocksource tests. So it will be rated zero anyway



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHJx1KjYI8LaFUWXMRAv0hAJ4sj0Z1FraYrgbU5Mj0pWOJGk6jtwCfc5xL
jpTC273X0oqPTCR7NcVHJOI=
=WPzV
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Glauber de Oliveira Costa
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Zachary Amsden escreveu:
> On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:
>> * Zachary Amsden <[EMAIL PROTECTED]> wrote:
> 
>> if it's inaccurate why are you exposing it to the guest then? Native 
>> only uses the TSC if it's safe and accurate to do so.
> 
> Not every guest support paravirt, but for correctness, all guests
> require TSC, which must be exposed all the way up to userspace, no
> matter what the efficiency or accuracy may be.
> 
> Zach
> 
However, with such accuracy, it will fail the tsc clocksources tests. If
it passes, it'a good clocksource to use.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHJxyujYI8LaFUWXMRAhI5AKCDl/O7iA3TdtpvElVg8NoSDVfKpgCeNvz3
ZEZDMxg8T7FA2R+5cgH/uKI=
=8qSH
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Rusty Russell
On Tuesday 30 October 2007 18:37:36 Ingo Molnar wrote:
> * Rusty Russell <[EMAIL PROTECTED]> wrote:
> > No.  tsc is very good, it's not perfect.  If a paravirt clock
> > registers 400 it really means "pick me over the tsc".
>
> often the TSC is not perfect, but _IF_ it's perfect, using the paravirt
> driver is a pessimisation in performance.
>
> the main problem at the moment is that there's no mechanism at the
> moment to convey to the guest the information that the TSC is "perfect",
> and to convey the calibration values.

The host can communicate to the guest what clock to use: the guest can decide 
to register a paravirt clock or not depending on whether it wants to leave it 
to the TSC.

For a while we couldn't remove the TSC cpuid capability in the guest, because 
if you configured your kernel in some ways it was hardcoded on.  I think 
the "all 686+ have a tsc" assumption has now been fixed, so I should change 
the lguest clock to do as you said: register its clock at lower prio to the 
TSC and then the host can simply remove the TSC cpuid if it isn't suitable 
for the guest to use.

ISTR the core TSC timing code (which lguest could use) and various hardware 
manipulations (which lguest couldn't) were intertwined, but I'll have to go 
back and check exactly what the issue was.

> and just in case it's not obvious: i am not arguing for the inclusion of
> the patch

Unfortunately, you and Thomas both acked the patch.  This says v bad things 
about how much review kernel patches get.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Ingo Molnar

* Rusty Russell <[EMAIL PROTECTED]> wrote:

> No.  tsc is very good, it's not perfect.  If a paravirt clock 
> registers 400 it really means "pick me over the tsc".

often the TSC is not perfect, but _IF_ it's perfect, using the paravirt 
driver is a pessimisation in performance.

the main problem at the moment is that there's no mechanism at the 
moment to convey to the guest the information that the TSC is "perfect", 
and to convey the calibration values.

> That's *why* they use > 400: it's in the documentation.

static values do not capture conditional quality like that of the TSC.

and just in case it's not obvious: i am not arguing for the inclusion of 
the patch, i'm just pointing out the plain fact that in the case where 
the TSC _is_ reliable, 5 different clocksource drivers for has obvious 
disadvantages. Anyone arguing against that simple point needs his head 
examined :) Once we can pass around calibration information from the 
host to the guest (which we dont do at the moment) there will be reason 
not to use the native clocksource driver in the guest.

in the long run, the paravirt clocksource drivers must become _fallback_ 
drivers, for the case when the TSC is not perfect. Instead of the 
current "lets try to make it reliable but also nearly as fast as the 
TSC", which leads to inevitable breakage on SMP.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Ingo Molnar

* Ian Pratt <[EMAIL PROTECTED]> wrote:

> > > Sigh, I don't really want to have this fight again.
> > 
> > i dont remember us having discussed this before, ever. If there's any
> > "fight" about monotonicity and SMP then it would be a pretty onesided
> > affair, with you being beaten up seriously ;-)
> 
> Actually, it is possible, even for NUMA systems with CPUs running off 
> completely different oscillators, and in the presence of CPU frequency 
> changes, power management, and even in the presence of thermal 
> throttling (though the latter introduces temporary inaccuracies it 
> doesn't affect monotonicity or rate).
> 
> Take a look at the Xen code to see how each physical CPU is 
> independently calibrated on an ongoing basis, how movement of VCPUs 
> between physical CPUs is tracked, and how shared variables are used to 
> ensure montonicity if a guest requires it.

I think that's wishful thinking. Check out:

http://people.redhat.com/mingo/time-warp-test/time-warp-test.c

change TEST_TSC to 0, run it on an SMP guest (on a reasonably fast 
machine) and let me know whether you can make SMP guests not come up 
with monotonicity violations in the CLOCK tests.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Ingo Molnar

* Dan Hecht <[EMAIL PROTECTED]> wrote:

>> but if there's a perfect TSC available (there is such hardware) then 
>> the TSC _is_ the best clocksource. Paravirt now turns it off 
>> unconditionally in essence.
>
> Not really.  In the case hardware TSC is perfect, the paravirt time 
> counter can be implemented directly in terms of hardware TSC; there is 
> no loss in optimization.  This is done transparently.  And virtual TSC 
> can be implemented this way too.

Of course if you duplicate all (or part) of the TSC clocksource driver 
in the paravirt guest code then the "paravirt clocksource" is at least 
as good as the TSC. But that argument is playing word-games, _of course_ 
if you use the same (or similar) code it's at least as good. The real 
question are clocksources that communicate out to the hypervisor, and 
hence have higher overhead than a native, TSC based clocksource - and 
clocksources that use the TSC in a broken way.

> The real improvement that a paravirt clocksource offers over the TSC 
> clocksource is that the guest does not need to measure the TSC 
> frequency itself against some other constant frequency source (which 
> is problematic on a virtual machine). [...]

hey, you need not tell me, i've implemented a hyper-clocksource driver 
myself. But calibration is a boot only issue and there's no reason why 
calibration _has_ to be fragile. For example we could easily extend the 
TSC clocksource driver to not calibrate in the guest but take 
calibration information from the host. It's in essence a trivial and 
obvious extension to calibration. That way we get the highest possible 
performance _and_ we share much of the clocksource driver with the host.

also, the way the TSC is used by guests like Xen is fundamentally 
fragile on SMP. So i have a good reason to distrust the approach of 
hypervisors to timekeeping. The maintenance problem to me is that 
everyone in the paravirt space is busy coding away in their own (often 
broken) direction, replicating the essence of the TSC clocksource driver 
4 times over again and again, with subtle bugs in each variant, even in 
cases where the TSC readout can be trusted perfectly well. 
"Consolidation" and "sharing code" is not a particularly strong point of 
the paravirt projects ;-) (ok, KVM is a notable exception there.)

anyway, i do agree that this patch is wrong currently, mainly due to TSC 
calibration not being reliable in guest-space at the moment - but the 
whole concept of putting a separate clocksource driver into each 
paravirt guest, even in the case where the TSC is perfect, is madness. 
That code, once the hardware gets sane (and there are good signs for 
that), and once calibration can be passed from host to guest reliably, 
_will_ be consolidated, because it makes perfect technical sense.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Ingo Molnar

* Dan Hecht [EMAIL PROTECTED] wrote:

 but if there's a perfect TSC available (there is such hardware) then 
 the TSC _is_ the best clocksource. Paravirt now turns it off 
 unconditionally in essence.

 Not really.  In the case hardware TSC is perfect, the paravirt time 
 counter can be implemented directly in terms of hardware TSC; there is 
 no loss in optimization.  This is done transparently.  And virtual TSC 
 can be implemented this way too.

Of course if you duplicate all (or part) of the TSC clocksource driver 
in the paravirt guest code then the paravirt clocksource is at least 
as good as the TSC. But that argument is playing word-games, _of course_ 
if you use the same (or similar) code it's at least as good. The real 
question are clocksources that communicate out to the hypervisor, and 
hence have higher overhead than a native, TSC based clocksource - and 
clocksources that use the TSC in a broken way.

 The real improvement that a paravirt clocksource offers over the TSC 
 clocksource is that the guest does not need to measure the TSC 
 frequency itself against some other constant frequency source (which 
 is problematic on a virtual machine). [...]

hey, you need not tell me, i've implemented a hyper-clocksource driver 
myself. But calibration is a boot only issue and there's no reason why 
calibration _has_ to be fragile. For example we could easily extend the 
TSC clocksource driver to not calibrate in the guest but take 
calibration information from the host. It's in essence a trivial and 
obvious extension to calibration. That way we get the highest possible 
performance _and_ we share much of the clocksource driver with the host.

also, the way the TSC is used by guests like Xen is fundamentally 
fragile on SMP. So i have a good reason to distrust the approach of 
hypervisors to timekeeping. The maintenance problem to me is that 
everyone in the paravirt space is busy coding away in their own (often 
broken) direction, replicating the essence of the TSC clocksource driver 
4 times over again and again, with subtle bugs in each variant, even in 
cases where the TSC readout can be trusted perfectly well. 
Consolidation and sharing code is not a particularly strong point of 
the paravirt projects ;-) (ok, KVM is a notable exception there.)

anyway, i do agree that this patch is wrong currently, mainly due to TSC 
calibration not being reliable in guest-space at the moment - but the 
whole concept of putting a separate clocksource driver into each 
paravirt guest, even in the case where the TSC is perfect, is madness. 
That code, once the hardware gets sane (and there are good signs for 
that), and once calibration can be passed from host to guest reliably, 
_will_ be consolidated, because it makes perfect technical sense.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Ingo Molnar

* Rusty Russell [EMAIL PROTECTED] wrote:

 No.  tsc is very good, it's not perfect.  If a paravirt clock 
 registers 400 it really means pick me over the tsc.

often the TSC is not perfect, but _IF_ it's perfect, using the paravirt 
driver is a pessimisation in performance.

the main problem at the moment is that there's no mechanism at the 
moment to convey to the guest the information that the TSC is perfect, 
and to convey the calibration values.

 That's *why* they use  400: it's in the documentation.

static values do not capture conditional quality like that of the TSC.

and just in case it's not obvious: i am not arguing for the inclusion of 
the patch, i'm just pointing out the plain fact that in the case where 
the TSC _is_ reliable, 5 different clocksource drivers for has obvious 
disadvantages. Anyone arguing against that simple point needs his head 
examined :) Once we can pass around calibration information from the 
host to the guest (which we dont do at the moment) there will be reason 
not to use the native clocksource driver in the guest.

in the long run, the paravirt clocksource drivers must become _fallback_ 
drivers, for the case when the TSC is not perfect. Instead of the 
current lets try to make it reliable but also nearly as fast as the 
TSC, which leads to inevitable breakage on SMP.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Ingo Molnar

* Ian Pratt [EMAIL PROTECTED] wrote:

   Sigh, I don't really want to have this fight again.
  
  i dont remember us having discussed this before, ever. If there's any
  fight about monotonicity and SMP then it would be a pretty onesided
  affair, with you being beaten up seriously ;-)
 
 Actually, it is possible, even for NUMA systems with CPUs running off 
 completely different oscillators, and in the presence of CPU frequency 
 changes, power management, and even in the presence of thermal 
 throttling (though the latter introduces temporary inaccuracies it 
 doesn't affect monotonicity or rate).
 
 Take a look at the Xen code to see how each physical CPU is 
 independently calibrated on an ongoing basis, how movement of VCPUs 
 between physical CPUs is tracked, and how shared variables are used to 
 ensure montonicity if a guest requires it.

I think that's wishful thinking. Check out:

http://people.redhat.com/mingo/time-warp-test/time-warp-test.c

change TEST_TSC to 0, run it on an SMP guest (on a reasonably fast 
machine) and let me know whether you can make SMP guests not come up 
with monotonicity violations in the CLOCK tests.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Rusty Russell
On Tuesday 30 October 2007 18:37:36 Ingo Molnar wrote:
 * Rusty Russell [EMAIL PROTECTED] wrote:
  No.  tsc is very good, it's not perfect.  If a paravirt clock
  registers 400 it really means pick me over the tsc.

 often the TSC is not perfect, but _IF_ it's perfect, using the paravirt
 driver is a pessimisation in performance.

 the main problem at the moment is that there's no mechanism at the
 moment to convey to the guest the information that the TSC is perfect,
 and to convey the calibration values.

The host can communicate to the guest what clock to use: the guest can decide 
to register a paravirt clock or not depending on whether it wants to leave it 
to the TSC.

For a while we couldn't remove the TSC cpuid capability in the guest, because 
if you configured your kernel in some ways it was hardcoded on.  I think 
the all 686+ have a tsc assumption has now been fixed, so I should change 
the lguest clock to do as you said: register its clock at lower prio to the 
TSC and then the host can simply remove the TSC cpuid if it isn't suitable 
for the guest to use.

ISTR the core TSC timing code (which lguest could use) and various hardware 
manipulations (which lguest couldn't) were intertwined, but I'll have to go 
back and check exactly what the issue was.

 and just in case it's not obvious: i am not arguing for the inclusion of
 the patch

Unfortunately, you and Thomas both acked the patch.  This says v bad things 
about how much review kernel patches get.

Cheers,
Rusty.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Glauber de Oliveira Costa
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Zachary Amsden escreveu:
 On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:
 * Zachary Amsden [EMAIL PROTECTED] wrote:
 
 if it's inaccurate why are you exposing it to the guest then? Native 
 only uses the TSC if it's safe and accurate to do so.
 
 Not every guest support paravirt, but for correctness, all guests
 require TSC, which must be exposed all the way up to userspace, no
 matter what the efficiency or accuracy may be.
 
 Zach
 
However, with such accuracy, it will fail the tsc clocksources tests. If
it passes, it'a good clocksource to use.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHJxyujYI8LaFUWXMRAhI5AKCDl/O7iA3TdtpvElVg8NoSDVfKpgCeNvz3
ZEZDMxg8T7FA2R+5cgH/uKI=
=8qSH
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Glauber de Oliveira Costa
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Zachary Amsden escreveu:
 On Tue, 2007-10-30 at 00:02 +0100, Ingo Molnar wrote:
 * Zachary Amsden [EMAIL PROTECTED] wrote:
 
 Not every guest support paravirt, but for correctness, all guests 
 require TSC, which must be exposed all the way up to userspace, no 
 matter what the efficiency or accuracy may be.
 but if there's a perfect TSC available (there is such hardware) then the 
 TSC _is_ the best clocksource. Paravirt now turns it off unconditionally 
 in essence.
 
 No, if no paravirt clocksource is detected, nothing can override the
 perfect TSC hardware clocksource rating of 400.  And if a paravirt
 clocksource is detected, it is always better than TSC.

Why always? tsc is the best possible alternative the _platform_ can
provide. So the paravirt clocksource will be at best, as good as tsc.
And if it is the case: why bother with communication mechanisms of all
kinds, calculations, etc, if we can just read the tsc?

Noting again: If the tsc does not arrive accurate to the guest, it will
fail the tsc clocksource tests. So it will be rated zero anyway



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHJx1KjYI8LaFUWXMRAv0hAJ4sj0Z1FraYrgbU5Mj0pWOJGk6jtwCfc5xL
jpTC273X0oqPTCR7NcVHJOI=
=WPzV
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Glauber de Oliveira Costa
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ingo Molnar escreveu:
 * Rusty Russell [EMAIL PROTECTED] wrote:
 
 and just in case it's not obvious: i am not arguing for the inclusion of 
 the patch, i'm just pointing out the plain fact that in the case where 
 the TSC _is_ reliable, 5 different clocksource drivers for has obvious 
 disadvantages. Anyone arguing against that simple point needs his head 
 examined :) Once we can pass around calibration information from the 
 host to the guest (which we dont do at the moment) there will be reason 
 not to use the native clocksource driver in the guest.
If you sustain that we cannot have a reliable synchronization test
mechanism, neither do I. All this is based in the assumption that a bad
tsc will fail such tests.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHJyAHjYI8LaFUWXMRAtYoAJ9l5kNodRLfTsNHDvyioufM7SQzTACfarEy
XXq3sDq9uxZ/72hhA46YmgM=
=DwN/
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-30 Thread Zachary Amsden
On Tue, 2007-10-30 at 10:02 -0200, Glauber de Oliveira Costa wrote:

  No, if no paravirt clocksource is detected, nothing can override the
  perfect TSC hardware clocksource rating of 400.  And if a paravirt
  clocksource is detected, it is always better than TSC.
 
 Why always? tsc is the best possible alternative the _platform_ can
 provide. So the paravirt clocksource will be at best, as good as tsc.

What is the justification for this claim?  The TSC provides no
information about frequency.  The frequency must be measured from a
known fixed rate source.  This measurement is error prone, both on real
hardware, and even more so in a VM.

The TSC can only provide a single measure of elapsed periods.  It has no
concept to differentiate between elapsed periods of time spent in the
guest, which is relevant to the scheduler, and elapsed periods of time
spent in real time, which is relevant to timekeeping.

So a TSC can not drive both timekeeping and scheduling in a virtual
machine, or if it does, it does so inaccurately.


 And if it is the case: why bother with communication mechanisms of all
 kinds, calculations, etc, if we can just read the tsc?
 
 Noting again: If the tsc does not arrive accurate to the guest, it will
 fail the tsc clocksource tests. So it will be rated zero anyway

You always need to provide a TSC, and it will always initially appear to
be accurate.  Long term, as time is reported via the TSC is an
approximation somewhere between real time and elapsed running time,
measurements made using it in a VM will drift, and it is impossible to
remove the inaccuracy from one end of the spectrum (by providing real
time exactly) without compromising it at the other end (by skewing
relative time measurement).

A paravirt clocksource is always better than a TSC because it
compensates for these effects.

Zach

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Rusty Russell
On Tuesday 30 October 2007 09:17:38 Thomas Gleixner wrote:
> On Mon, 29 Oct 2007, Glauber de Oliveira Costa wrote:
>
> CC'ed John and removed [EMAIL PROTECTED] :)
>
> > From: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
> >
> > tsc is very good time source (when it does not have drifts, does not
> > change it's frequency, i.e. when it works), so it should have its rating
> > raised to a value greater than, or equal 400.
> >
> > Since it's being a tendency among paravirt clocksources to use values
> > around 400, we should declare tsc as even better: So we use 500.
> >
> > This patch also touches the comments on clocksource.h, which suggests
> > that 499 would be a limit on the rating values.
> >
> > Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
>
> Acked-by: Thomas Gleixner <[EMAIL PROTECTED]>

No.  tsc is very good, it's not perfect.  If a paravirt clock registers 400 it 
really means "pick me over the tsc".

That's *why* they use > 400: it's in the documentation.

Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Ian Pratt
> > Sigh, I don't really want to have this fight again.
> 
> i dont remember us having discussed this before, ever. If there's any
> "fight" about monotonicity and SMP then it would be a pretty onesided
> affair, with you being beaten up seriously ;-)

Actually, it is possible, even for NUMA systems with CPUs running off
completely different oscillators, and in the presence of CPU frequency
changes, power management, and even in the presence of thermal
throttling (though the latter introduces temporary inaccuracies it
doesn't affect monotonicity or rate).

Take a look at the Xen code to see how each physical CPU is
independently calibrated on an ongoing basis, how movement of VCPUs
between physical CPUs is tracked, and how shared variables are used to
ensure montonicity if a guest requires it. 

The fixed-rate TSCs on newer CPUs make some of this stuff easier, but
you still need to cope with different source oscillators and some power
management states.
 
Ian

> > I don't really see what point there is in raising the tsc's rating
> > (why is 300 insufficient again?), but regardless of the value, the
Xen
> > clocksource rating needs to be higher.
> 
> anyway, i agree that this patch cannot go in in its current form.
> 
>   Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe
linux-kernel"
> in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread john stultz
On Mon, 2007-10-29 at 23:17 +0100, Thomas Gleixner wrote:
> On Mon, 29 Oct 2007, Glauber de Oliveira Costa wrote:
> 
> CC'ed John and removed [EMAIL PROTECTED] :)
> 
> > From: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
> > 
> > tsc is very good time source (when it does not have drifts, does not
> > change it's frequency, i.e. when it works), so it should have its rating
> > raised to a value greater than, or equal 400.

But all of those qualities (and more) make it less then ideal.

What issue exactly does this patch resolve?

> > Since it's being a tendency among paravirt clocksources to use values
> > around 400, we should declare tsc as even better: So we use 500.

This is the rating inflation I was initially worried about and created
the advisory scale to avoid. If paravirt clocksources are rated too
high, they should be dropped. 

I strongly disagree that the TSC is a "perfect" clocksource, and I think
its rating is appropriate. 


> > This patch also touches the comments on clocksource.h, which suggests
> > that 499 would be a limit on the rating values.
> > 
> > Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
> 
> Acked-by: Thomas Gleixner <[EMAIL PROTECTED]>

If a better justification can be provided, I might reconsider, but right
now I can't ack this.

thanks
-john

> > ---
> >  arch/x86/kernel/tsc_32.c|2 +-
> >  arch/x86/kernel/tsc_64.c|2 +-
> >  include/linux/clocksource.h |2 +-
> >  3 files changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
> > index 9ebc0da..4d91e59 100644
> > --- a/arch/x86/kernel/tsc_32.c
> > +++ b/arch/x86/kernel/tsc_32.c
> > @@ -280,7 +280,7 @@ static cycle_t read_tsc(void)
> >  
> >  static struct clocksource clocksource_tsc = {
> > .name   = "tsc",
> > -   .rating = 300,
> > +   .rating = 500,
> > .read   = read_tsc,
> > .mask   = CLOCKSOURCE_MASK(64),
> > .mult   = 0, /* to be set */
> > diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
> > index 9c70af4..4fd5b1b 100644
> > --- a/arch/x86/kernel/tsc_64.c
> > +++ b/arch/x86/kernel/tsc_64.c
> > @@ -262,7 +262,7 @@ static cycle_t __vsyscall_fn vread_tsc(void)
> >  
> >  static struct clocksource clocksource_tsc = {
> > .name   = "tsc",
> > -   .rating = 300,
> > +   .rating = 500,
> > .read   = read_tsc,
> > .mask   = CLOCKSOURCE_MASK(64),
> > .shift  = 22,
> > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> > index 107787a..5b0aadd 100644
> > --- a/include/linux/clocksource.h
> > +++ b/include/linux/clocksource.h
> > @@ -39,7 +39,7 @@ struct clocksource;
> >   * A correct and usable clocksource.
> >   * 300-399: Desired.
> >   * A reasonably fast and accurate clocksource.
> > - * 400-499: Perfect
> > + * >= 400 : Perfect
> >   * The ideal clocksource. A must-use where
> >   * available.
> >   * @read:  returns a cycle value
> > -- 
> > 1.5.0.6
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread H. Peter Anvin

Glauber de Oliveira Costa wrote:

From: Glauber de Oliveira Costa <[EMAIL PROTECTED]>

tsc is very good time source (when it does not have drifts, does not
change it's frequency, i.e. when it works), so it should have its rating
raised to a value greater than, or equal 400.

Since it's being a tendency among paravirt clocksources to use values
around 400, we should declare tsc as even better: So we use 500.

This patch also touches the comments on clocksource.h, which suggests
that 499 would be a limit on the rating values.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>


Are you sure these paravirt sources don't intentionally trump the TSC?

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Jeremy Fitzhardinge
Ingo Molnar wrote:
> i dont remember us having discussed this before, ever. If there's any 
> "fight" about monotonicity and SMP then it would be a pretty onesided 
> affair, with you being beaten up seriously ;-)
>   

This is part of Xen's ABI, so it isn't easily changed.  You're right
that getting monotonicity is a pretty ugly affair of doing something
like "if (now < previous_return) return ++previous_return;", but its
better than using the tsc.

(Last time around, it was "Xen can't use the tsc because it can't ever
be used as a stable timebase" - though I don't think that was you
specifically.)

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Ingo Molnar

* Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:

> Ingo Molnar wrote:
> > that's totally broken then. You cannot create an SMP-safe monotonic 
> > clocksource via interpolation - native does not do it either. Good thing 
> > this problem got exposed, it needs to be fixed.
> >   
> 
> Sigh, I don't really want to have this fight again.

i dont remember us having discussed this before, ever. If there's any 
"fight" about monotonicity and SMP then it would be a pretty onesided 
affair, with you being beaten up seriously ;-)

> I don't really see what point there is in raising the tsc's rating 
> (why is 300 insufficient again?), but regardless of the value, the Xen 
> clocksource rating needs to be higher.

anyway, i agree that this patch cannot go in in its current form.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Dan Hecht

On 10/29/2007 04:02 PM, Ingo Molnar wrote:

* Zachary Amsden <[EMAIL PROTECTED]> wrote:


On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:

* Zachary Amsden <[EMAIL PROTECTED]> wrote:
if it's inaccurate why are you exposing it to the guest then? Native 
only uses the TSC if it's safe and accurate to do so.
Not every guest support paravirt, but for correctness, all guests 
require TSC, which must be exposed all the way up to userspace, no 
matter what the efficiency or accuracy may be.


but if there's a perfect TSC available (there is such hardware) then the 
TSC _is_ the best clocksource. Paravirt now turns it off unconditionally 
in essence.




Not really.  In the case hardware TSC is perfect, the paravirt time 
counter can be implemented directly in terms of hardware TSC; there is 
no loss in optimization.  This is done transparently.  And virtual TSC 
can be implemented this way too.


The real improvement that a paravirt clocksource offers over the TSC 
clocksource is that the guest does not need to measure the TSC frequency 
itself against some other constant frequency source (which is 
problematic on a virtual machine).  Instead, the paravirt clocksource 
queries the hypervisor for the frequency of the counter.  As you know, 
with clocksource style kernels, it's important to get this frequency 
correct, or else the guest will have long-term time drift.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Ingo Molnar

* Zachary Amsden <[EMAIL PROTECTED]> wrote:

> On Tue, 2007-10-30 at 00:02 +0100, Ingo Molnar wrote:
> > * Zachary Amsden <[EMAIL PROTECTED]> wrote:
> 
> > > Not every guest support paravirt, but for correctness, all guests 
> > > require TSC, which must be exposed all the way up to userspace, no 
> > > matter what the efficiency or accuracy may be.
> > 
> > but if there's a perfect TSC available (there is such hardware) then the 
> > TSC _is_ the best clocksource. Paravirt now turns it off unconditionally 
> > in essence.
> 
> No, if no paravirt clocksource is detected, nothing can override the
> perfect TSC hardware clocksource rating of 400.  And if a paravirt
> clocksource is detected, it is always better than TSC.
> 
> > anyway, that's at most an optimization issue. No strong feelings here, 
> > and we can certainly delay this patch until this gets all sorted out.
> 
> This patch should be nacked, since it is just wrong.  This is not an 
> optimization issue.  It is an accuracy issue for all virtualization 
> environments that affects long term kernel clock stability, which is 
> important to fix, and the best way to do that is to use a paravirt 
> clocksource.

i know it's not an optimization issue. Your current pessimisation of 
even perfect TSCs _is_ an optimization issue.

(and note that if the TSC is unstable the guest will/should notice it 
and will fall back to the hyper clocksource)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Jeremy Fitzhardinge
Ingo Molnar wrote:
> that's totally broken then. You cannot create an SMP-safe monotonic 
> clocksource via interpolation - native does not do it either. Good thing 
> this problem got exposed, it needs to be fixed.
>   

Sigh, I don't really want to have this fight again.

I don't really see what point there is in raising the tsc's rating (why
is 300 insufficient again?), but regardless of the value, the Xen
clocksource rating needs to be higher.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Zachary Amsden
On Tue, 2007-10-30 at 00:02 +0100, Ingo Molnar wrote:
> * Zachary Amsden <[EMAIL PROTECTED]> wrote:

> > Not every guest support paravirt, but for correctness, all guests 
> > require TSC, which must be exposed all the way up to userspace, no 
> > matter what the efficiency or accuracy may be.
> 
> but if there's a perfect TSC available (there is such hardware) then the 
> TSC _is_ the best clocksource. Paravirt now turns it off unconditionally 
> in essence.

No, if no paravirt clocksource is detected, nothing can override the
perfect TSC hardware clocksource rating of 400.  And if a paravirt
clocksource is detected, it is always better than TSC.

> anyway, that's at most an optimization issue. No strong feelings here, 
> and we can certainly delay this patch until this gets all sorted out.

This patch should be nacked, since it is just wrong.  This is not an
optimization issue.  It is an accuracy issue for all virtualization
environments that affects long term kernel clock stability, which is
important to fix, and the best way to do that is to use a paravirt
clocksource.

Zach

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Ingo Molnar

* Zachary Amsden <[EMAIL PROTECTED]> wrote:

> On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:
> > * Zachary Amsden <[EMAIL PROTECTED]> wrote:
> 
> > if it's inaccurate why are you exposing it to the guest then? Native 
> > only uses the TSC if it's safe and accurate to do so.
> 
> Not every guest support paravirt, but for correctness, all guests 
> require TSC, which must be exposed all the way up to userspace, no 
> matter what the efficiency or accuracy may be.

but if there's a perfect TSC available (there is such hardware) then the 
TSC _is_ the best clocksource. Paravirt now turns it off unconditionally 
in essence.

anyway, that's at most an optimization issue. No strong feelings here, 
and we can certainly delay this patch until this gets all sorted out.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Ingo Molnar

* Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:

> > if it's inaccurate why are you exposing it to the guest then? Native 
> > only uses the TSC if it's safe and accurate to do so.
> 
> It is used as part of the Xen clocksource as a short term 
> extrapolator, with correction parameters supplied by the hypervisor.  
> It should never be used directly.

that's totally broken then. You cannot create an SMP-safe monotonic 
clocksource via interpolation - native does not do it either. Good thing 
this problem got exposed, it needs to be fixed.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Zachary Amsden
On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:
> * Zachary Amsden <[EMAIL PROTECTED]> wrote:

> if it's inaccurate why are you exposing it to the guest then? Native 
> only uses the TSC if it's safe and accurate to do so.

Not every guest support paravirt, but for correctness, all guests
require TSC, which must be exposed all the way up to userspace, no
matter what the efficiency or accuracy may be.

Zach

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Jeremy Fitzhardinge
Ingo Molnar wrote:
> * Zachary Amsden <[EMAIL PROTECTED]> wrote:
>
>   
>> On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
>> 
>>> From: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
>>>
>>> tsc is very good time source (when it does not have drifts, does not
>>> change it's frequency, i.e. when it works), so it should have its rating
>>> raised to a value greater than, or equal 400.
>>>
>>> Since it's being a tendency among paravirt clocksources to use values
>>> around 400, we should declare tsc as even better: So we use 500.
>>>   
>> Why is the TSC better than a paravirt clocksource?  In our case this 
>> is definitely inaccurate.  Paravirt clocksources should be preferred 
>> to TSC, and both must be made available in hardware for platforms 
>> which do not support paravirt.
>> 
>
> if it's inaccurate why are you exposing it to the guest then? Native 
> only uses the TSC if it's safe and accurate to do so.
>   

It is used as part of the Xen clocksource as a short term extrapolator,
with correction parameters supplied by the hypervisor.  It should never
be used directly.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Ingo Molnar

* Zachary Amsden <[EMAIL PROTECTED]> wrote:

> On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
> > From: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
> > 
> > tsc is very good time source (when it does not have drifts, does not
> > change it's frequency, i.e. when it works), so it should have its rating
> > raised to a value greater than, or equal 400.
> > 
> > Since it's being a tendency among paravirt clocksources to use values
> > around 400, we should declare tsc as even better: So we use 500.
> 
> Why is the TSC better than a paravirt clocksource?  In our case this 
> is definitely inaccurate.  Paravirt clocksources should be preferred 
> to TSC, and both must be made available in hardware for platforms 
> which do not support paravirt.

if it's inaccurate why are you exposing it to the guest then? Native 
only uses the TSC if it's safe and accurate to do so.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
>   
>> From: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
>>
>> tsc is very good time source (when it does not have drifts, does not
>> change it's frequency, i.e. when it works), so it should have its rating
>> raised to a value greater than, or equal 400.
>>
>> Since it's being a tendency among paravirt clocksources to use values
>> around 400, we should declare tsc as even better: So we use 500.
>> 
>
> Why is the TSC better than a paravirt clocksource?  In our case this is
> definitely inaccurate.  Paravirt clocksources should be preferred to
> TSC, and both must be made available in hardware for platforms which do
> not support paravirt.
>
> Also, please cc all the paravirt developers on things related to
> paravirt, especially things with such broad effect.  I think 400 is a
> good value for a perfect native clocksource.  >400 should be reserved
> for super-real (i.e. paravirt) sources that should always be chosen over
> a hardware realistic implementation in a virtual environment.
>   

Yes, agreed.  The tsc is never the right thing to use if there's a
paravirt clocksource available.

What's wrong with rating it 300?  What inferior clocksource does it lose
out to?  Shouldn't that clocksource be lowered?  (Why don't we just use
1 to 10?)

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Zachary Amsden
On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
> From: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
> 
> tsc is very good time source (when it does not have drifts, does not
> change it's frequency, i.e. when it works), so it should have its rating
> raised to a value greater than, or equal 400.
> 
> Since it's being a tendency among paravirt clocksources to use values
> around 400, we should declare tsc as even better: So we use 500.

Why is the TSC better than a paravirt clocksource?  In our case this is
definitely inaccurate.  Paravirt clocksources should be preferred to
TSC, and both must be made available in hardware for platforms which do
not support paravirt.

Also, please cc all the paravirt developers on things related to
paravirt, especially things with such broad effect.  I think 400 is a
good value for a perfect native clocksource.  >400 should be reserved
for super-real (i.e. paravirt) sources that should always be chosen over
a hardware realistic implementation in a virtual environment.

Zach

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Ingo Molnar

* Thomas Gleixner <[EMAIL PROTECTED]> wrote:

> > Since it's being a tendency among paravirt clocksources to use 
> > values around 400, we should declare tsc as even better: So we use 
> > 500.
> > 
> > This patch also touches the comments on clocksource.h, which 
> > suggests that 499 would be a limit on the rating values.
> > 
> > Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
> 
> Acked-by: Thomas Gleixner <[EMAIL PROTECTED]>

Acked-by: Ingo Molnar <[EMAIL PROTECTED]>

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Thomas Gleixner
On Mon, 29 Oct 2007, Glauber de Oliveira Costa wrote:

CC'ed John and removed [EMAIL PROTECTED] :)

> From: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
> 
> tsc is very good time source (when it does not have drifts, does not
> change it's frequency, i.e. when it works), so it should have its rating
> raised to a value greater than, or equal 400.
> 
> Since it's being a tendency among paravirt clocksources to use values
> around 400, we should declare tsc as even better: So we use 500.
> 
> This patch also touches the comments on clocksource.h, which suggests
> that 499 would be a limit on the rating values.
> 
> Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>

Acked-by: Thomas Gleixner <[EMAIL PROTECTED]>

> ---
>  arch/x86/kernel/tsc_32.c|2 +-
>  arch/x86/kernel/tsc_64.c|2 +-
>  include/linux/clocksource.h |2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
> index 9ebc0da..4d91e59 100644
> --- a/arch/x86/kernel/tsc_32.c
> +++ b/arch/x86/kernel/tsc_32.c
> @@ -280,7 +280,7 @@ static cycle_t read_tsc(void)
>  
>  static struct clocksource clocksource_tsc = {
>   .name   = "tsc",
> - .rating = 300,
> + .rating = 500,
>   .read   = read_tsc,
>   .mask   = CLOCKSOURCE_MASK(64),
>   .mult   = 0, /* to be set */
> diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
> index 9c70af4..4fd5b1b 100644
> --- a/arch/x86/kernel/tsc_64.c
> +++ b/arch/x86/kernel/tsc_64.c
> @@ -262,7 +262,7 @@ static cycle_t __vsyscall_fn vread_tsc(void)
>  
>  static struct clocksource clocksource_tsc = {
>   .name   = "tsc",
> - .rating = 300,
> + .rating = 500,
>   .read   = read_tsc,
>   .mask   = CLOCKSOURCE_MASK(64),
>   .shift  = 22,
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index 107787a..5b0aadd 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -39,7 +39,7 @@ struct clocksource;
>   *   A correct and usable clocksource.
>   *   300-399: Desired.
>   *   A reasonably fast and accurate clocksource.
> - *   400-499: Perfect
> + *   >= 400 : Perfect
>   *   The ideal clocksource. A must-use where
>   *   available.
>   * @read:returns a cycle value
> -- 
> 1.5.0.6
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Ingo Molnar

* Thomas Gleixner [EMAIL PROTECTED] wrote:

  Since it's being a tendency among paravirt clocksources to use 
  values around 400, we should declare tsc as even better: So we use 
  500.
  
  This patch also touches the comments on clocksource.h, which 
  suggests that 499 would be a limit on the rating values.
  
  Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
 
 Acked-by: Thomas Gleixner [EMAIL PROTECTED]

Acked-by: Ingo Molnar [EMAIL PROTECTED]

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
 On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
   
 From: Glauber de Oliveira Costa [EMAIL PROTECTED]

 tsc is very good time source (when it does not have drifts, does not
 change it's frequency, i.e. when it works), so it should have its rating
 raised to a value greater than, or equal 400.

 Since it's being a tendency among paravirt clocksources to use values
 around 400, we should declare tsc as even better: So we use 500.
 

 Why is the TSC better than a paravirt clocksource?  In our case this is
 definitely inaccurate.  Paravirt clocksources should be preferred to
 TSC, and both must be made available in hardware for platforms which do
 not support paravirt.

 Also, please cc all the paravirt developers on things related to
 paravirt, especially things with such broad effect.  I think 400 is a
 good value for a perfect native clocksource.  400 should be reserved
 for super-real (i.e. paravirt) sources that should always be chosen over
 a hardware realistic implementation in a virtual environment.
   

Yes, agreed.  The tsc is never the right thing to use if there's a
paravirt clocksource available.

What's wrong with rating it 300?  What inferior clocksource does it lose
out to?  Shouldn't that clocksource be lowered?  (Why don't we just use
1 to 10?)

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Ingo Molnar

* Zachary Amsden [EMAIL PROTECTED] wrote:

 On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
  From: Glauber de Oliveira Costa [EMAIL PROTECTED]
  
  tsc is very good time source (when it does not have drifts, does not
  change it's frequency, i.e. when it works), so it should have its rating
  raised to a value greater than, or equal 400.
  
  Since it's being a tendency among paravirt clocksources to use values
  around 400, we should declare tsc as even better: So we use 500.
 
 Why is the TSC better than a paravirt clocksource?  In our case this 
 is definitely inaccurate.  Paravirt clocksources should be preferred 
 to TSC, and both must be made available in hardware for platforms 
 which do not support paravirt.

if it's inaccurate why are you exposing it to the guest then? Native 
only uses the TSC if it's safe and accurate to do so.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Zachary Amsden
On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
 From: Glauber de Oliveira Costa [EMAIL PROTECTED]
 
 tsc is very good time source (when it does not have drifts, does not
 change it's frequency, i.e. when it works), so it should have its rating
 raised to a value greater than, or equal 400.
 
 Since it's being a tendency among paravirt clocksources to use values
 around 400, we should declare tsc as even better: So we use 500.

Why is the TSC better than a paravirt clocksource?  In our case this is
definitely inaccurate.  Paravirt clocksources should be preferred to
TSC, and both must be made available in hardware for platforms which do
not support paravirt.

Also, please cc all the paravirt developers on things related to
paravirt, especially things with such broad effect.  I think 400 is a
good value for a perfect native clocksource.  400 should be reserved
for super-real (i.e. paravirt) sources that should always be chosen over
a hardware realistic implementation in a virtual environment.

Zach

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Thomas Gleixner
On Mon, 29 Oct 2007, Glauber de Oliveira Costa wrote:

CC'ed John and removed [EMAIL PROTECTED] :)

 From: Glauber de Oliveira Costa [EMAIL PROTECTED]
 
 tsc is very good time source (when it does not have drifts, does not
 change it's frequency, i.e. when it works), so it should have its rating
 raised to a value greater than, or equal 400.
 
 Since it's being a tendency among paravirt clocksources to use values
 around 400, we should declare tsc as even better: So we use 500.
 
 This patch also touches the comments on clocksource.h, which suggests
 that 499 would be a limit on the rating values.
 
 Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]

Acked-by: Thomas Gleixner [EMAIL PROTECTED]

 ---
  arch/x86/kernel/tsc_32.c|2 +-
  arch/x86/kernel/tsc_64.c|2 +-
  include/linux/clocksource.h |2 +-
  3 files changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
 index 9ebc0da..4d91e59 100644
 --- a/arch/x86/kernel/tsc_32.c
 +++ b/arch/x86/kernel/tsc_32.c
 @@ -280,7 +280,7 @@ static cycle_t read_tsc(void)
  
  static struct clocksource clocksource_tsc = {
   .name   = tsc,
 - .rating = 300,
 + .rating = 500,
   .read   = read_tsc,
   .mask   = CLOCKSOURCE_MASK(64),
   .mult   = 0, /* to be set */
 diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
 index 9c70af4..4fd5b1b 100644
 --- a/arch/x86/kernel/tsc_64.c
 +++ b/arch/x86/kernel/tsc_64.c
 @@ -262,7 +262,7 @@ static cycle_t __vsyscall_fn vread_tsc(void)
  
  static struct clocksource clocksource_tsc = {
   .name   = tsc,
 - .rating = 300,
 + .rating = 500,
   .read   = read_tsc,
   .mask   = CLOCKSOURCE_MASK(64),
   .shift  = 22,
 diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
 index 107787a..5b0aadd 100644
 --- a/include/linux/clocksource.h
 +++ b/include/linux/clocksource.h
 @@ -39,7 +39,7 @@ struct clocksource;
   *   A correct and usable clocksource.
   *   300-399: Desired.
   *   A reasonably fast and accurate clocksource.
 - *   400-499: Perfect
 + *   = 400 : Perfect
   *   The ideal clocksource. A must-use where
   *   available.
   * @read:returns a cycle value
 -- 
 1.5.0.6
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Ingo Molnar

* Zachary Amsden [EMAIL PROTECTED] wrote:

 On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:
  * Zachary Amsden [EMAIL PROTECTED] wrote:
 
  if it's inaccurate why are you exposing it to the guest then? Native 
  only uses the TSC if it's safe and accurate to do so.
 
 Not every guest support paravirt, but for correctness, all guests 
 require TSC, which must be exposed all the way up to userspace, no 
 matter what the efficiency or accuracy may be.

but if there's a perfect TSC available (there is such hardware) then the 
TSC _is_ the best clocksource. Paravirt now turns it off unconditionally 
in essence.

anyway, that's at most an optimization issue. No strong feelings here, 
and we can certainly delay this patch until this gets all sorted out.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Zachary Amsden
On Tue, 2007-10-30 at 00:02 +0100, Ingo Molnar wrote:
 * Zachary Amsden [EMAIL PROTECTED] wrote:

  Not every guest support paravirt, but for correctness, all guests 
  require TSC, which must be exposed all the way up to userspace, no 
  matter what the efficiency or accuracy may be.
 
 but if there's a perfect TSC available (there is such hardware) then the 
 TSC _is_ the best clocksource. Paravirt now turns it off unconditionally 
 in essence.

No, if no paravirt clocksource is detected, nothing can override the
perfect TSC hardware clocksource rating of 400.  And if a paravirt
clocksource is detected, it is always better than TSC.

 anyway, that's at most an optimization issue. No strong feelings here, 
 and we can certainly delay this patch until this gets all sorted out.

This patch should be nacked, since it is just wrong.  This is not an
optimization issue.  It is an accuracy issue for all virtualization
environments that affects long term kernel clock stability, which is
important to fix, and the best way to do that is to use a paravirt
clocksource.

Zach

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Ingo Molnar

* Zachary Amsden [EMAIL PROTECTED] wrote:

 On Tue, 2007-10-30 at 00:02 +0100, Ingo Molnar wrote:
  * Zachary Amsden [EMAIL PROTECTED] wrote:
 
   Not every guest support paravirt, but for correctness, all guests 
   require TSC, which must be exposed all the way up to userspace, no 
   matter what the efficiency or accuracy may be.
  
  but if there's a perfect TSC available (there is such hardware) then the 
  TSC _is_ the best clocksource. Paravirt now turns it off unconditionally 
  in essence.
 
 No, if no paravirt clocksource is detected, nothing can override the
 perfect TSC hardware clocksource rating of 400.  And if a paravirt
 clocksource is detected, it is always better than TSC.
 
  anyway, that's at most an optimization issue. No strong feelings here, 
  and we can certainly delay this patch until this gets all sorted out.
 
 This patch should be nacked, since it is just wrong.  This is not an 
 optimization issue.  It is an accuracy issue for all virtualization 
 environments that affects long term kernel clock stability, which is 
 important to fix, and the best way to do that is to use a paravirt 
 clocksource.

i know it's not an optimization issue. Your current pessimisation of 
even perfect TSCs _is_ an optimization issue.

(and note that if the TSC is unstable the guest will/should notice it 
and will fall back to the hyper clocksource)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Jeremy Fitzhardinge
Ingo Molnar wrote:
 that's totally broken then. You cannot create an SMP-safe monotonic 
 clocksource via interpolation - native does not do it either. Good thing 
 this problem got exposed, it needs to be fixed.
   

Sigh, I don't really want to have this fight again.

I don't really see what point there is in raising the tsc's rating (why
is 300 insufficient again?), but regardless of the value, the Xen
clocksource rating needs to be higher.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Ingo Molnar

* Jeremy Fitzhardinge [EMAIL PROTECTED] wrote:

  if it's inaccurate why are you exposing it to the guest then? Native 
  only uses the TSC if it's safe and accurate to do so.
 
 It is used as part of the Xen clocksource as a short term 
 extrapolator, with correction parameters supplied by the hypervisor.  
 It should never be used directly.

that's totally broken then. You cannot create an SMP-safe monotonic 
clocksource via interpolation - native does not do it either. Good thing 
this problem got exposed, it needs to be fixed.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Jeremy Fitzhardinge
Ingo Molnar wrote:
 * Zachary Amsden [EMAIL PROTECTED] wrote:

   
 On Mon, 2007-10-29 at 20:10 -0300, Glauber de Oliveira Costa wrote:
 
 From: Glauber de Oliveira Costa [EMAIL PROTECTED]

 tsc is very good time source (when it does not have drifts, does not
 change it's frequency, i.e. when it works), so it should have its rating
 raised to a value greater than, or equal 400.

 Since it's being a tendency among paravirt clocksources to use values
 around 400, we should declare tsc as even better: So we use 500.
   
 Why is the TSC better than a paravirt clocksource?  In our case this 
 is definitely inaccurate.  Paravirt clocksources should be preferred 
 to TSC, and both must be made available in hardware for platforms 
 which do not support paravirt.
 

 if it's inaccurate why are you exposing it to the guest then? Native 
 only uses the TSC if it's safe and accurate to do so.
   

It is used as part of the Xen clocksource as a short term extrapolator,
with correction parameters supplied by the hypervisor.  It should never
be used directly.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Ingo Molnar

* Jeremy Fitzhardinge [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
  that's totally broken then. You cannot create an SMP-safe monotonic 
  clocksource via interpolation - native does not do it either. Good thing 
  this problem got exposed, it needs to be fixed.

 
 Sigh, I don't really want to have this fight again.

i dont remember us having discussed this before, ever. If there's any 
fight about monotonicity and SMP then it would be a pretty onesided 
affair, with you being beaten up seriously ;-)

 I don't really see what point there is in raising the tsc's rating 
 (why is 300 insufficient again?), but regardless of the value, the Xen 
 clocksource rating needs to be higher.

anyway, i agree that this patch cannot go in in its current form.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Dan Hecht

On 10/29/2007 04:02 PM, Ingo Molnar wrote:

* Zachary Amsden [EMAIL PROTECTED] wrote:


On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:

* Zachary Amsden [EMAIL PROTECTED] wrote:
if it's inaccurate why are you exposing it to the guest then? Native 
only uses the TSC if it's safe and accurate to do so.
Not every guest support paravirt, but for correctness, all guests 
require TSC, which must be exposed all the way up to userspace, no 
matter what the efficiency or accuracy may be.


but if there's a perfect TSC available (there is such hardware) then the 
TSC _is_ the best clocksource. Paravirt now turns it off unconditionally 
in essence.




Not really.  In the case hardware TSC is perfect, the paravirt time 
counter can be implemented directly in terms of hardware TSC; there is 
no loss in optimization.  This is done transparently.  And virtual TSC 
can be implemented this way too.


The real improvement that a paravirt clocksource offers over the TSC 
clocksource is that the guest does not need to measure the TSC frequency 
itself against some other constant frequency source (which is 
problematic on a virtual machine).  Instead, the paravirt clocksource 
queries the hypervisor for the frequency of the counter.  As you know, 
with clocksource style kernels, it's important to get this frequency 
correct, or else the guest will have long-term time drift.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Zachary Amsden
On Mon, 2007-10-29 at 23:48 +0100, Ingo Molnar wrote:
 * Zachary Amsden [EMAIL PROTECTED] wrote:

 if it's inaccurate why are you exposing it to the guest then? Native 
 only uses the TSC if it's safe and accurate to do so.

Not every guest support paravirt, but for correctness, all guests
require TSC, which must be exposed all the way up to userspace, no
matter what the efficiency or accuracy may be.

Zach

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Jeremy Fitzhardinge
Ingo Molnar wrote:
 i dont remember us having discussed this before, ever. If there's any 
 fight about monotonicity and SMP then it would be a pretty onesided 
 affair, with you being beaten up seriously ;-)
   

This is part of Xen's ABI, so it isn't easily changed.  You're right
that getting monotonicity is a pretty ugly affair of doing something
like if (now  previous_return) return ++previous_return;, but its
better than using the tsc.

(Last time around, it was Xen can't use the tsc because it can't ever
be used as a stable timebase - though I don't think that was you
specifically.)

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread H. Peter Anvin

Glauber de Oliveira Costa wrote:

From: Glauber de Oliveira Costa [EMAIL PROTECTED]

tsc is very good time source (when it does not have drifts, does not
change it's frequency, i.e. when it works), so it should have its rating
raised to a value greater than, or equal 400.

Since it's being a tendency among paravirt clocksources to use values
around 400, we should declare tsc as even better: So we use 500.

This patch also touches the comments on clocksource.h, which suggests
that 499 would be a limit on the rating values.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]


Are you sure these paravirt sources don't intentionally trump the TSC?

-hpa

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Ian Pratt
  Sigh, I don't really want to have this fight again.
 
 i dont remember us having discussed this before, ever. If there's any
 fight about monotonicity and SMP then it would be a pretty onesided
 affair, with you being beaten up seriously ;-)

Actually, it is possible, even for NUMA systems with CPUs running off
completely different oscillators, and in the presence of CPU frequency
changes, power management, and even in the presence of thermal
throttling (though the latter introduces temporary inaccuracies it
doesn't affect monotonicity or rate).

Take a look at the Xen code to see how each physical CPU is
independently calibrated on an ongoing basis, how movement of VCPUs
between physical CPUs is tracked, and how shared variables are used to
ensure montonicity if a guest requires it. 

The fixed-rate TSCs on newer CPUs make some of this stuff easier, but
you still need to cope with different source oscillators and some power
management states.
 
Ian

  I don't really see what point there is in raising the tsc's rating
  (why is 300 insufficient again?), but regardless of the value, the
Xen
  clocksource rating needs to be higher.
 
 anyway, i agree that this patch cannot go in in its current form.
 
   Ingo
 -
 To unsubscribe from this list: send the line unsubscribe
linux-kernel
 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread john stultz
On Mon, 2007-10-29 at 23:17 +0100, Thomas Gleixner wrote:
 On Mon, 29 Oct 2007, Glauber de Oliveira Costa wrote:
 
 CC'ed John and removed [EMAIL PROTECTED] :)
 
  From: Glauber de Oliveira Costa [EMAIL PROTECTED]
  
  tsc is very good time source (when it does not have drifts, does not
  change it's frequency, i.e. when it works), so it should have its rating
  raised to a value greater than, or equal 400.

But all of those qualities (and more) make it less then ideal.

What issue exactly does this patch resolve?

  Since it's being a tendency among paravirt clocksources to use values
  around 400, we should declare tsc as even better: So we use 500.

This is the rating inflation I was initially worried about and created
the advisory scale to avoid. If paravirt clocksources are rated too
high, they should be dropped. 

I strongly disagree that the TSC is a perfect clocksource, and I think
its rating is appropriate. 


  This patch also touches the comments on clocksource.h, which suggests
  that 499 would be a limit on the rating values.
  
  Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
 
 Acked-by: Thomas Gleixner [EMAIL PROTECTED]

If a better justification can be provided, I might reconsider, but right
now I can't ack this.

thanks
-john

  ---
   arch/x86/kernel/tsc_32.c|2 +-
   arch/x86/kernel/tsc_64.c|2 +-
   include/linux/clocksource.h |2 +-
   3 files changed, 3 insertions(+), 3 deletions(-)
  
  diff --git a/arch/x86/kernel/tsc_32.c b/arch/x86/kernel/tsc_32.c
  index 9ebc0da..4d91e59 100644
  --- a/arch/x86/kernel/tsc_32.c
  +++ b/arch/x86/kernel/tsc_32.c
  @@ -280,7 +280,7 @@ static cycle_t read_tsc(void)
   
   static struct clocksource clocksource_tsc = {
  .name   = tsc,
  -   .rating = 300,
  +   .rating = 500,
  .read   = read_tsc,
  .mask   = CLOCKSOURCE_MASK(64),
  .mult   = 0, /* to be set */
  diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
  index 9c70af4..4fd5b1b 100644
  --- a/arch/x86/kernel/tsc_64.c
  +++ b/arch/x86/kernel/tsc_64.c
  @@ -262,7 +262,7 @@ static cycle_t __vsyscall_fn vread_tsc(void)
   
   static struct clocksource clocksource_tsc = {
  .name   = tsc,
  -   .rating = 300,
  +   .rating = 500,
  .read   = read_tsc,
  .mask   = CLOCKSOURCE_MASK(64),
  .shift  = 22,
  diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
  index 107787a..5b0aadd 100644
  --- a/include/linux/clocksource.h
  +++ b/include/linux/clocksource.h
  @@ -39,7 +39,7 @@ struct clocksource;
* A correct and usable clocksource.
* 300-399: Desired.
* A reasonably fast and accurate clocksource.
  - * 400-499: Perfect
  + * = 400 : Perfect
* The ideal clocksource. A must-use where
* available.
* @read:  returns a cycle value
  -- 
  1.5.0.6
  
  -
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
  

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] raise tsc clocksource rating

2007-10-29 Thread Rusty Russell
On Tuesday 30 October 2007 09:17:38 Thomas Gleixner wrote:
 On Mon, 29 Oct 2007, Glauber de Oliveira Costa wrote:

 CC'ed John and removed [EMAIL PROTECTED] :)

  From: Glauber de Oliveira Costa [EMAIL PROTECTED]
 
  tsc is very good time source (when it does not have drifts, does not
  change it's frequency, i.e. when it works), so it should have its rating
  raised to a value greater than, or equal 400.
 
  Since it's being a tendency among paravirt clocksources to use values
  around 400, we should declare tsc as even better: So we use 500.
 
  This patch also touches the comments on clocksource.h, which suggests
  that 499 would be a limit on the rating values.
 
  Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]

 Acked-by: Thomas Gleixner [EMAIL PROTECTED]

No.  tsc is very good, it's not perfect.  If a paravirt clock registers 400 it 
really means pick me over the tsc.

That's *why* they use  400: it's in the documentation.

Rusty.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/