Re: KVM Clock

2014-12-10 Thread Ryan Stone
On Mon, Jan 20, 2014 at 7:41 PM, Julian Stecklina
 wrote:
> Ah. Thanks. This will do. Something like access_once would be perfect, though.
>
> I'll post an updated patch that does not duplicate code that is in xen/ 
> soonish. Didn't get around to testing it the last days.
>
> Julian

I'm interesting in helping to get this into the tree.  I see that
you've address all of the comments in this thread here:

https://github.com/blitz/freebsd/commit/e130f4ba550eb905c7f56a6a26243660b4df834b

I have one final question.  Don't we need to run rdtsc() in the loop
in pvclock_fetch_vcpu_tinfo()?  If we don't, isn't there a chance that
after we leave the loop, the hypervisor could migrate the guest to a
different CPU where the value returned by rdtsc() will not match the
parameters in the pvclock_time_info structure?
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: KVM Clock

2014-01-20 Thread Julian Stecklina


John Baldwin  wrote:
>There is the __compiler_membar() macro in  that you could
>use if 
>this code is x86-specific (and thus knows it only needs a compiler
>barrier).

Ah. Thanks. This will do. Something like access_once would be perfect, though.

I'll post an updated patch that does not duplicate code that is in xen/ 
soonish. Didn't get around to testing it the last days.

Julian
-- 
Out of office. Please excuse my brevity.
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: KVM Clock

2014-01-20 Thread John Baldwin
On Wednesday 15 January 2014 17:59:10 Julian Stecklina wrote:
> On Mi, 2014-01-15 at 17:04 +0100, Roger Pau Monné wrote:
> > On 15/01/14 14:45, Julian Stecklina wrote:
> > > On Mi, 2014-01-15 at 14:08 +0100, Roger Pau Monné wrote:
> > >> On 15/01/14 13:05, Julian Stecklina wrote:
> > >>> On 01/14/2014 05:13 PM, Peter Grehan wrote:
> > >>>> Hi Julian,
> > >>>> 
> > >>>>> is anyone working on KVM clock support for FreeBSD? If not, I
> > >>>>> might take a shot at it.
> > >>>> 
> > >>>> None I know of: go for it :)
> > >>> 
> > >>> Works for me so far:
> > >>> https://github.com/blitz/freebsd/commit/cdc5f872b3e48cc0dda031fc7d6bde
> > >>> dc65c3148f> >> 
> > >> Looking
> > >> 
> > >> at the code it seems some common routines could be shared
> > >> between the KVM PV clock and the Xen PV clock
> > >> (sys/dev/xen/timer/timer.c). The data passed from the hypervisor to
> > >> the guest has exactly the same format (see struct vcpu_time_info in
> > >> Xen public headers).
> > > 
> > > Yes, I know. Didn't get around to making it pretty yesterday evening. ;)
> > > I'll post an updated patch, when I have some time. Any suggestions where
> > > to put the two common functions?
> > > 
> > >> At a first sight the KVM clock can benefit from using scale_delta
> > >> (which is going to be faster than the C version implemented in
> > >> kvmclock_get_timecount),
> > > 
> > > At least somewhat on amd64. 32-bit looks pretty identical.
> > > 
> > >>  and xen_fetch_vcpu_tinfo is exactly the same
> > >> 
> > >> as kvmclock_fetch.
> > > 
> > > I think xen_fetch_vcpu_tinfo has a subtle bug:
> > >   217 do {
> > >   218 dst->version = src->version;
> > >   219 rmb();
> > >   220 dst->tsc_timestamp = src->tsc_timestamp;
> > >   221 dst->system_time = src->system_time;
> > >   222 dst->tsc_to_system_mul = src->tsc_to_system_mul;
> > >   223 dst->tsc_shift = src->tsc_shift;
> > >   224 rmb();
> > >   225 } while ((src->version & 1) | (dst->version ^
> > > 
> > > src->version));
> > > 
> > > In line 225 src->version is potentially read twice. If you consider the
> > > following schedule:
> > > 
> > > host starts updating data, sets src->version to 1
> > > guest reads src->version (1) and stores it into dst->version.
> > > guest copies inconsistent data
> > > guest reads src->version (1) and computes xor with dst->version.
> > > host finishes updating data and sets src->version to 2
> > > guest reads src->version (2) and checks whether lower bit is not set.
> > > while loop exits with inconsistent data!
> > > 
> > > I think the C standard (at least C11) permits this as it does not
> > > specify in which order the two reads in line 225 need to happen.
> > 
> > According to the operator precedence and associativity in C
> > (http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B#Operator_preceden
> > ce), if I'm reading it right, the condition in the while line will be
> > evaluated as follows (because of the left-to-right associativity of the
> > 
> > | operator):
> > 1. (src->version & 1)
> > 2. (dst->version ^ src->version)
> > 3. result of 1 | result of 2
> > 
> > So AFAICT the flow that you describe could never happen, because
> > (src->version & 1) is always done before (dst->version ^ src->version).
> 
> Operator precedence doesn't matter. Consider it written like this:
> 
> or(and(src->version, 1), xor(dst->version, src->version))
> 
> C gives you no guarantees whether the and or the xor will be executed
> first. There is no sequence point in between. And even with a sequence
> point, the C11 memory model gives you no guarantees about how the reads
> are ordered.
> 
> This discussion is somewhat theoretical, because
>  a) the hypervisor will probably update the data structure in the
> current vCPU context (making memory consistency issues go away).
>  b) the compiler will likely not be an asshole. ;)
> 
> Pseudocode for a better fetch could be:
> 
> dst->version = src->version;
> rmb();
> ... read data ...
> rmb();
> version_post = src->version;
> rmb(); <- keeps the compiler from reading src->version multiple times
> 
> and then doing the test with version_post and dst->version.
> Unfortunately, rmb() expands into lfence, even if there is no need for
> that here.

There is the __compiler_membar() macro in  that you could use if 
this code is x86-specific (and thus knows it only needs a compiler barrier).

-- 
John Baldwin
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: KVM Clock

2014-01-15 Thread Julian Stecklina
On Mi, 2014-01-15 at 17:04 +0100, Roger Pau Monné wrote:
> On 15/01/14 14:45, Julian Stecklina wrote:
> > On Mi, 2014-01-15 at 14:08 +0100, Roger Pau Monné wrote:
> >> On 15/01/14 13:05, Julian Stecklina wrote:
> >>> On 01/14/2014 05:13 PM, Peter Grehan wrote:
> >>>> Hi Julian,
> >>>>
> >>>>> is anyone working on KVM clock support for FreeBSD? If not, I
> >>>>> might take a shot at it.
> >>>>
> >>>> None I know of: go for it :)
> >>>
> >>> Works for me so far: 
> >>> https://github.com/blitz/freebsd/commit/cdc5f872b3e48cc0dda031fc7d6bdedc65c3148f
> >>
> >> Looking
> >>>
> >> at the code it seems some common routines could be shared
> >> between the KVM PV clock and the Xen PV clock
> >> (sys/dev/xen/timer/timer.c). The data passed from the hypervisor to
> >> the guest has exactly the same format (see struct vcpu_time_info in
> >> Xen public headers).
> > 
> > Yes, I know. Didn't get around to making it pretty yesterday evening. ;)
> > I'll post an updated patch, when I have some time. Any suggestions where
> > to put the two common functions?
> > 
> >> At a first sight the KVM clock can benefit from using scale_delta
> >> (which is going to be faster than the C version implemented in
> >> kvmclock_get_timecount),
> > 
> > At least somewhat on amd64. 32-bit looks pretty identical.
> > 
> >>  and xen_fetch_vcpu_tinfo is exactly the same
> >> as kvmclock_fetch.
> > 
> > I think xen_fetch_vcpu_tinfo has a subtle bug:
> >   217 do {
> >   218 dst->version = src->version;
> >   219 rmb();
> >   220 dst->tsc_timestamp = src->tsc_timestamp;
> >   221 dst->system_time = src->system_time;
> >   222 dst->tsc_to_system_mul = src->tsc_to_system_mul;
> >   223 dst->tsc_shift = src->tsc_shift;
> >   224 rmb();
> >   225 } while ((src->version & 1) | (dst->version ^
> > src->version));
> > 
> > In line 225 src->version is potentially read twice. If you consider the
> > following schedule:
> > 
> > host starts updating data, sets src->version to 1
> > guest reads src->version (1) and stores it into dst->version.
> > guest copies inconsistent data
> > guest reads src->version (1) and computes xor with dst->version.
> > host finishes updating data and sets src->version to 2
> > guest reads src->version (2) and checks whether lower bit is not set.
> > while loop exits with inconsistent data!
> > 
> > I think the C standard (at least C11) permits this as it does not
> > specify in which order the two reads in line 225 need to happen.
> 
> According to the operator precedence and associativity in C
> (http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B#Operator_precedence),
> if I'm reading it right, the condition in the while line will be
> evaluated as follows (because of the left-to-right associativity of the
> | operator):
> 
> 1. (src->version & 1)
> 2. (dst->version ^ src->version)
> 3. result of 1 | result of 2
> 
> So AFAICT the flow that you describe could never happen, because
> (src->version & 1) is always done before (dst->version ^ src->version).

Operator precedence doesn't matter. Consider it written like this:

or(and(src->version, 1), xor(dst->version, src->version))

C gives you no guarantees whether the and or the xor will be executed
first. There is no sequence point in between. And even with a sequence
point, the C11 memory model gives you no guarantees about how the reads
are ordered.

This discussion is somewhat theoretical, because
 a) the hypervisor will probably update the data structure in the
current vCPU context (making memory consistency issues go away).
 b) the compiler will likely not be an asshole. ;)

Pseudocode for a better fetch could be:

dst->version = src->version;
rmb();
... read data ...
rmb();
version_post = src->version;
rmb(); <- keeps the compiler from reading src->version multiple times

and then doing the test with version_post and dst->version.
Unfortunately, rmb() expands into lfence, even if there is no need for
that here. 

Regards,
Julian

___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: KVM Clock

2014-01-15 Thread Roger Pau Monné
On 15/01/14 14:45, Julian Stecklina wrote:
> On Mi, 2014-01-15 at 14:08 +0100, Roger Pau Monné wrote:
>> On 15/01/14 13:05, Julian Stecklina wrote:
>>> On 01/14/2014 05:13 PM, Peter Grehan wrote:
>>>> Hi Julian,
>>>>
>>>>> is anyone working on KVM clock support for FreeBSD? If not, I
>>>>> might take a shot at it.
>>>>
>>>> None I know of: go for it :)
>>>
>>> Works for me so far: 
>>> https://github.com/blitz/freebsd/commit/cdc5f872b3e48cc0dda031fc7d6bdedc65c3148f
>>
>> Looking
>>>
>> at the code it seems some common routines could be shared
>> between the KVM PV clock and the Xen PV clock
>> (sys/dev/xen/timer/timer.c). The data passed from the hypervisor to
>> the guest has exactly the same format (see struct vcpu_time_info in
>> Xen public headers).
> 
> Yes, I know. Didn't get around to making it pretty yesterday evening. ;)
> I'll post an updated patch, when I have some time. Any suggestions where
> to put the two common functions?
> 
>> At a first sight the KVM clock can benefit from using scale_delta
>> (which is going to be faster than the C version implemented in
>> kvmclock_get_timecount),
> 
> At least somewhat on amd64. 32-bit looks pretty identical.
> 
>>  and xen_fetch_vcpu_tinfo is exactly the same
>> as kvmclock_fetch.
> 
> I think xen_fetch_vcpu_tinfo has a subtle bug:
>   217 do {
>   218 dst->version = src->version;
>   219 rmb();
>   220 dst->tsc_timestamp = src->tsc_timestamp;
>   221 dst->system_time = src->system_time;
>   222 dst->tsc_to_system_mul = src->tsc_to_system_mul;
>   223 dst->tsc_shift = src->tsc_shift;
>   224 rmb();
>   225 } while ((src->version & 1) | (dst->version ^
> src->version));
> 
> In line 225 src->version is potentially read twice. If you consider the
> following schedule:
> 
> host starts updating data, sets src->version to 1
> guest reads src->version (1) and stores it into dst->version.
> guest copies inconsistent data
> guest reads src->version (1) and computes xor with dst->version.
> host finishes updating data and sets src->version to 2
> guest reads src->version (2) and checks whether lower bit is not set.
> while loop exits with inconsistent data!
> 
> I think the C standard (at least C11) permits this as it does not
> specify in which order the two reads in line 225 need to happen.

According to the operator precedence and associativity in C
(http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B#Operator_precedence),
if I'm reading it right, the condition in the while line will be
evaluated as follows (because of the left-to-right associativity of the
| operator):

1. (src->version & 1)
2. (dst->version ^ src->version)
3. result of 1 | result of 2

So AFAICT the flow that you describe could never happen, because
(src->version & 1) is always done before (dst->version ^ src->version).

Roger.
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: KVM Clock

2014-01-15 Thread Julian Stecklina
On Mi, 2014-01-15 at 14:08 +0100, Roger Pau Monné wrote:
> On 15/01/14 13:05, Julian Stecklina wrote:
> > On 01/14/2014 05:13 PM, Peter Grehan wrote:
> >> Hi Julian,
> >> 
> >>> is anyone working on KVM clock support for FreeBSD? If not, I
> >>> might take a shot at it.
> >> 
> >> None I know of: go for it :)
> > 
> > Works for me so far: 
> > https://github.com/blitz/freebsd/commit/cdc5f872b3e48cc0dda031fc7d6bdedc65c3148f
> 
> Looking
> > 
> at the code it seems some common routines could be shared
> between the KVM PV clock and the Xen PV clock
> (sys/dev/xen/timer/timer.c). The data passed from the hypervisor to
> the guest has exactly the same format (see struct vcpu_time_info in
> Xen public headers).

Yes, I know. Didn't get around to making it pretty yesterday evening. ;)
I'll post an updated patch, when I have some time. Any suggestions where
to put the two common functions?

> At a first sight the KVM clock can benefit from using scale_delta
> (which is going to be faster than the C version implemented in
> kvmclock_get_timecount),

At least somewhat on amd64. 32-bit looks pretty identical.

>  and xen_fetch_vcpu_tinfo is exactly the same
> as kvmclock_fetch.

I think xen_fetch_vcpu_tinfo has a subtle bug:
  217 do {
  218 dst->version = src->version;
  219 rmb();
  220 dst->tsc_timestamp = src->tsc_timestamp;
  221 dst->system_time = src->system_time;
  222 dst->tsc_to_system_mul = src->tsc_to_system_mul;
  223 dst->tsc_shift = src->tsc_shift;
  224 rmb();
  225 } while ((src->version & 1) | (dst->version ^
src->version));

In line 225 src->version is potentially read twice. If you consider the
following schedule:

host starts updating data, sets src->version to 1
guest reads src->version (1) and stores it into dst->version.
guest copies inconsistent data
guest reads src->version (1) and computes xor with dst->version.
host finishes updating data and sets src->version to 2
guest reads src->version (2) and checks whether lower bit is not set.
while loop exits with inconsistent data!

I think the C standard (at least C11) permits this as it does not
specify in which order the two reads in line 225 need to happen.

Julian

___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: KVM Clock

2014-01-15 Thread Roger Pau Monné
On 15/01/14 13:05, Julian Stecklina wrote:
> On 01/14/2014 05:13 PM, Peter Grehan wrote:
>> Hi Julian,
>> 
>>> is anyone working on KVM clock support for FreeBSD? If not, I
>>> might take a shot at it.
>> 
>> None I know of: go for it :)
> 
> Works for me so far: 
> https://github.com/blitz/freebsd/commit/cdc5f872b3e48cc0dda031fc7d6bdedc65c3148f

Looking
> 
at the code it seems some common routines could be shared
between the KVM PV clock and the Xen PV clock
(sys/dev/xen/timer/timer.c). The data passed from the hypervisor to
the guest has exactly the same format (see struct vcpu_time_info in
Xen public headers).

At a first sight the KVM clock can benefit from using scale_delta
(which is going to be faster than the C version implemented in
kvmclock_get_timecount), and xen_fetch_vcpu_tinfo is exactly the same
as kvmclock_fetch.

Roger.

___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


Re: KVM Clock

2014-01-15 Thread Julian Stecklina
On 01/14/2014 05:13 PM, Peter Grehan wrote:
> Hi Julian,
> 
>> is anyone working on KVM clock support for FreeBSD? If not, I might take
>> a shot at it.
> 
>  None I know of: go for it :)

Works for me so far:
https://github.com/blitz/freebsd/commit/cdc5f872b3e48cc0dda031fc7d6bdedc65c3148f

If anyone wants to test this, use any recent qemu with KVM:
wget http://os.inf.tu-dresden.de/~jsteckli/tmp/mfsbsd-kvmclock.iso
qemu-kvm -m 1024 -cdrom mfsbsd-kvmclock.iso.

and watch for

KVM-style paravirtualized clock detected.
Timecounter "KVMCLOCK" frequency 10 Hz quality 1000

in the kernel log.

Julian




signature.asc
Description: OpenPGP digital signature


Re: KVM Clock

2014-01-14 Thread Peter Grehan

Hi Julian,


is anyone working on KVM clock support for FreeBSD? If not, I might take
a shot at it.


 None I know of: go for it :)

later,

Peter.
___
freebsd-virtualization@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"


KVM Clock

2014-01-14 Thread Julian Stecklina
Hello,

is anyone working on KVM clock support for FreeBSD? If not, I might take
a shot at it.

Julian



signature.asc
Description: OpenPGP digital signature