Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-28 Thread Konrad Rzeszutek Wilk
On Mon, Jan 27, 2014 at 01:33:01PM +0100, Julian Stecklina wrote:
 On 01/24/2014 07:08 PM, Konrad Rzeszutek Wilk wrote:
  On Thu, Jan 16, 2014 at 03:13:44PM +0100, Julian Stecklina wrote:
  The paravirtualized clock used in KVM and Xen uses a version field to
  allow the guest to see when the shared data structure is inconsistent.
  The code reads the version field twice (before and after the data
  structure is copied) and checks whether they haven't
  changed and that the lowest bit is not set. As the second access is not
  synchronized, the compiler could generate code that accesses version
  more than two times and you end up with inconsistent data.
  
  Could you paste in the code that the 'bad' compiler generates
  vs the compiler that generate 'good' code please?
 
 At least 4.8 and probably older compilers compile this as intended. The
 point is that the standard does not guarantee the indented behavior,
 i.e. the code is wrong.

Perhaps I misunderstood Jan's response but it sounded to me like
that the compiler was not adhering to the standard?

 
 I can refer to this lwn article:
 https://lwn.net/Articles/508991/
 
 The whole point of ACCESS_ONCE is to avoid time bombs like that. There
 are lots of place where ACCESS_ONCE is used in the kernel:
 
 http://lxr.free-electrons.com/ident?i=ACCESS_ONCE
 
 See for example the check_zero function here:
 http://lxr.free-electrons.com/source/arch/x86/kernel/kvm.c#L559
 

In other words, you don't have a sample of 'bad' compiler code.


 Julian
 
  
 
  An example using pvclock_get_time_values:
 
  host starts updating data, sets src-version to 1
  guest reads src-version (1) and stores it into dst-version.
  guest copies inconsistent data
  guest reads src-version (1) and computes xor with dst-version.
  host finishes updating data and sets src-version to 2
  guest reads src-version (2) and checks whether lower bit is not set.
  while loop exits with inconsistent data!
 
  AFAICS the compiler is allowed to optimize the given code this way.
 
  Signed-off-by: Julian Stecklina jstec...@os.inf.tu-dresden.de
  ---
   arch/x86/kernel/pvclock.c | 10 +++---
   1 file changed, 7 insertions(+), 3 deletions(-)
 
  diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
  index 42eb330..f62b41c 100644
  --- a/arch/x86/kernel/pvclock.c
  +++ b/arch/x86/kernel/pvclock.c
  @@ -55,6 +55,8 @@ static u64 pvclock_get_nsec_offset(struct 
  pvclock_shadow_time *shadow)
   static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
 struct pvclock_vcpu_time_info *src)
   {
  +  u32 nversion;
  +
 do {
 dst-version = src-version;
 rmb();  /* fetch version before data */
  @@ -64,7 +66,8 @@ static unsigned pvclock_get_time_values(struct 
  pvclock_shadow_time *dst,
 dst-tsc_shift = src-tsc_shift;
 dst-flags = src-flags;
 rmb();  /* test version after fetching data */
  -  } while ((src-version  1) || (dst-version != src-version));
  +  nversion = ACCESS_ONCE(src-version);
  +  } while ((nversion  1) || (dst-version != nversion));
   
 return dst-version;
   }
  @@ -135,7 +138,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
  *wall_clock,
 struct pvclock_vcpu_time_info *vcpu_time,
 struct timespec *ts)
   {
  -  u32 version;
  +  u32 version, nversion;
 u64 delta;
 struct timespec now;
   
  @@ -146,7 +149,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
  *wall_clock,
 now.tv_sec  = wall_clock-sec;
 now.tv_nsec = wall_clock-nsec;
 rmb();  /* fetch time before checking version */
  -  } while ((wall_clock-version  1) || (version != wall_clock-version));
  +  nversion = ACCESS_ONCE(wall_clock-version);
  +  } while ((nversion  1) || (version != nversion));
   
 delta = pvclock_clocksource_read(vcpu_time);/* time since system 
  boot */
 delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
  -- 
  1.8.4.2
 
 
  ___
  Xen-devel mailing list
  xen-de...@lists.xen.org
  http://lists.xen.org/xen-devel
 
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-28 Thread Konrad Rzeszutek Wilk
On Mon, Jan 27, 2014 at 06:47:58PM +0100, Paolo Bonzini wrote:
 Il 17/01/2014 10:41, Jan Beulich ha scritto:
  One half of this doesn't apply here, due to the explicit barriers
  that are there. The half about converting local variable accesses
  back to memory reads (i.e. eliding the local variable), however,
  is only a theoretical issue afaict: If a compiler really did this, I
  think there'd be far more places where this would hurt.
 
 Perhaps.  But for example seqlocks get it right.
 
  I don't think so - this would only be an issue if the conditions used
  | instead of ||. || implies a sequence point between evaluating the
  left and right sides, and the standard says: The presence of a
  sequence point between the evaluation of expressions A and B
  implies that every value computation and side effect associated
  with A is sequenced before every value computation and side
  effect associated with B.
 
 I suspect this is widely ignored by compilers if A is not 
 side-effecting.  The above wording would imply that
 
  x = a || b=x = (a | b) != 0
 
 (where a and b are non-volatile globals) would be an invalid 
 change.  The compiler would have to do:
 
  temp = a;
  barrier();
  x = (temp | b) != 0
 
 and I'm pretty sure that no compiler does it this way unless C11/C++11
 atomics are involved (at which point accesses become side-effecting).
 
 The code has changed and pvclock_get_time_values moved to
 __pvclock_read_cycles, but I think the problem remains.  Another approach
 to fixing this (and one I prefer) is to do the same thing as seqlocks:
 turn off the low bit in the return value of __pvclock_read_cycles,
 and drop the || altogether.  Untested patch after my name.

Is there a good test-case to confirm that this patch does not introduce
any regressions?


 
 Paolo
 
 diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
 index d6b078e9fa28..5aec80adaf54 100644
 --- a/arch/x86/include/asm/pvclock.h
 +++ b/arch/x86/include/asm/pvclock.h
 @@ -75,7 +75,7 @@ unsigned __pvclock_read_cycles(const struct 
 pvclock_vcpu_time_info *src,
   cycle_t ret, offset;
   u8 ret_flags;
  
 - version = src-version;
 + version = src-version  ~1;
   /* Note: emulated platforms which do not advertise SSE2 support
* result in kvmclock not using the necessary RDTSC barriers.
* Without barriers, it is possible that RDTSC instruction reads from
 diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
 index 2f355d229a58..a5052a87d55e 100644
 --- a/arch/x86/kernel/pvclock.c
 +++ b/arch/x86/kernel/pvclock.c
 @@ -66,7 +66,7 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
  
   do {
   version = __pvclock_read_cycles(src, ret, flags);
 - } while ((src-version  1) || version != src-version);
 + } while (version != src-version);
  
   return flags  valid_flags;
  }
 @@ -80,7 +80,7 @@ cycle_t pvclock_clocksource_read(struct 
 pvclock_vcpu_time_info *src)
  
   do {
   version = __pvclock_read_cycles(src, ret, flags);
 - } while ((src-version  1) || version != src-version);
 + } while (version != src-version);
  
   if (unlikely((flags  PVCLOCK_GUEST_STOPPED) != 0)) {
   src-flags = ~PVCLOCK_GUEST_STOPPED;
 diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
 index eb5d7a56f8d4..f09b09bcb515 100644
 --- a/arch/x86/vdso/vclock_gettime.c
 +++ b/arch/x86/vdso/vclock_gettime.c
 @@ -117,7 +117,6 @@ static notrace cycle_t vread_pvclock(int *mode)
*/
   cpu1 = __getcpu()  VGETCPU_CPU_MASK;
   } while (unlikely(cpu != cpu1 ||
 -   (pvti-pvti.version  1) ||
 pvti-pvti.version != version));
  
   if (unlikely(!(flags  PVCLOCK_TSC_STABLE_BIT)))
 
 
 ___
 Xen-devel mailing list
 xen-de...@lists.xen.org
 http://lists.xen.org/xen-devel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-28 Thread Julian Stecklina
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/28/2014 04:16 PM, Konrad Rzeszutek Wilk wrote:
 At least 4.8 and probably older compilers compile this as
 intended. The point is that the standard does not guarantee the
 indented behavior, i.e. the code is wrong.
 
 Perhaps I misunderstood Jan's response but it sounded to me like 
 that the compiler was not adhering to the standard?

Compilers are free to generate code that breaks the current clock code
while staying within the standards. If the compiler ignores the
standard, all bets are off anyway...

 
 I can refer to this lwn article: 
 https://lwn.net/Articles/508991/
 
 The whole point of ACCESS_ONCE is to avoid time bombs like that.
 There are lots of place where ACCESS_ONCE is used in the kernel:
 
 http://lxr.free-electrons.com/ident?i=ACCESS_ONCE
 
 See for example the check_zero function here: 
 http://lxr.free-electrons.com/source/arch/x86/kernel/kvm.c#L559
 
 
 In other words, you don't have a sample of 'bad' compiler code.

Nope.

Julian

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iEUEARECAAYFAlLn1ZgACgkQ2EtjUdW3H9n/DgCSAmrVCyxrs42aFFB3Ug+aw4kN
7wCgmEO4CbOI3BkOXKSorY91td9u7H8=
=fgrl
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-27 Thread Julian Stecklina
On 01/24/2014 07:08 PM, Konrad Rzeszutek Wilk wrote:
 On Thu, Jan 16, 2014 at 03:13:44PM +0100, Julian Stecklina wrote:
 The paravirtualized clock used in KVM and Xen uses a version field to
 allow the guest to see when the shared data structure is inconsistent.
 The code reads the version field twice (before and after the data
 structure is copied) and checks whether they haven't
 changed and that the lowest bit is not set. As the second access is not
 synchronized, the compiler could generate code that accesses version
 more than two times and you end up with inconsistent data.
 
 Could you paste in the code that the 'bad' compiler generates
 vs the compiler that generate 'good' code please?

At least 4.8 and probably older compilers compile this as intended. The
point is that the standard does not guarantee the indented behavior,
i.e. the code is wrong.

I can refer to this lwn article:
https://lwn.net/Articles/508991/

The whole point of ACCESS_ONCE is to avoid time bombs like that. There
are lots of place where ACCESS_ONCE is used in the kernel:

http://lxr.free-electrons.com/ident?i=ACCESS_ONCE

See for example the check_zero function here:
http://lxr.free-electrons.com/source/arch/x86/kernel/kvm.c#L559

Julian

 

 An example using pvclock_get_time_values:

 host starts updating data, sets src-version to 1
 guest reads src-version (1) and stores it into dst-version.
 guest copies inconsistent data
 guest reads src-version (1) and computes xor with dst-version.
 host finishes updating data and sets src-version to 2
 guest reads src-version (2) and checks whether lower bit is not set.
 while loop exits with inconsistent data!

 AFAICS the compiler is allowed to optimize the given code this way.

 Signed-off-by: Julian Stecklina jstec...@os.inf.tu-dresden.de
 ---
  arch/x86/kernel/pvclock.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

 diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
 index 42eb330..f62b41c 100644
 --- a/arch/x86/kernel/pvclock.c
 +++ b/arch/x86/kernel/pvclock.c
 @@ -55,6 +55,8 @@ static u64 pvclock_get_nsec_offset(struct 
 pvclock_shadow_time *shadow)
  static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
  struct pvclock_vcpu_time_info *src)
  {
 +u32 nversion;
 +
  do {
  dst-version = src-version;
  rmb();  /* fetch version before data */
 @@ -64,7 +66,8 @@ static unsigned pvclock_get_time_values(struct 
 pvclock_shadow_time *dst,
  dst-tsc_shift = src-tsc_shift;
  dst-flags = src-flags;
  rmb();  /* test version after fetching data */
 -} while ((src-version  1) || (dst-version != src-version));
 +nversion = ACCESS_ONCE(src-version);
 +} while ((nversion  1) || (dst-version != nversion));
  
  return dst-version;
  }
 @@ -135,7 +138,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
 *wall_clock,
  struct pvclock_vcpu_time_info *vcpu_time,
  struct timespec *ts)
  {
 -u32 version;
 +u32 version, nversion;
  u64 delta;
  struct timespec now;
  
 @@ -146,7 +149,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
 *wall_clock,
  now.tv_sec  = wall_clock-sec;
  now.tv_nsec = wall_clock-nsec;
  rmb();  /* fetch time before checking version */
 -} while ((wall_clock-version  1) || (version != wall_clock-version));
 +nversion = ACCESS_ONCE(wall_clock-version);
 +} while ((nversion  1) || (version != nversion));
  
  delta = pvclock_clocksource_read(vcpu_time);/* time since system 
 boot */
  delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
 -- 
 1.8.4.2


 ___
 Xen-devel mailing list
 xen-de...@lists.xen.org
 http://lists.xen.org/xen-devel




signature.asc
Description: OpenPGP digital signature


Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-27 Thread Paolo Bonzini
Il 17/01/2014 10:41, Jan Beulich ha scritto:
 One half of this doesn't apply here, due to the explicit barriers
 that are there. The half about converting local variable accesses
 back to memory reads (i.e. eliding the local variable), however,
 is only a theoretical issue afaict: If a compiler really did this, I
 think there'd be far more places where this would hurt.

Perhaps.  But for example seqlocks get it right.

 I don't think so - this would only be an issue if the conditions used
 | instead of ||. || implies a sequence point between evaluating the
 left and right sides, and the standard says: The presence of a
 sequence point between the evaluation of expressions A and B
 implies that every value computation and side effect associated
 with A is sequenced before every value computation and side
 effect associated with B.

I suspect this is widely ignored by compilers if A is not 
side-effecting.  The above wording would imply that

 x = a || b=x = (a | b) != 0

(where a and b are non-volatile globals) would be an invalid 
change.  The compiler would have to do:

 temp = a;
 barrier();
 x = (temp | b) != 0

and I'm pretty sure that no compiler does it this way unless C11/C++11
atomics are involved (at which point accesses become side-effecting).

The code has changed and pvclock_get_time_values moved to
__pvclock_read_cycles, but I think the problem remains.  Another approach
to fixing this (and one I prefer) is to do the same thing as seqlocks:
turn off the low bit in the return value of __pvclock_read_cycles,
and drop the || altogether.  Untested patch after my name.

Paolo

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index d6b078e9fa28..5aec80adaf54 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -75,7 +75,7 @@ unsigned __pvclock_read_cycles(const struct 
pvclock_vcpu_time_info *src,
cycle_t ret, offset;
u8 ret_flags;
 
-   version = src-version;
+   version = src-version  ~1;
/* Note: emulated platforms which do not advertise SSE2 support
 * result in kvmclock not using the necessary RDTSC barriers.
 * Without barriers, it is possible that RDTSC instruction reads from
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 2f355d229a58..a5052a87d55e 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -66,7 +66,7 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
 
do {
version = __pvclock_read_cycles(src, ret, flags);
-   } while ((src-version  1) || version != src-version);
+   } while (version != src-version);
 
return flags  valid_flags;
 }
@@ -80,7 +80,7 @@ cycle_t pvclock_clocksource_read(struct 
pvclock_vcpu_time_info *src)
 
do {
version = __pvclock_read_cycles(src, ret, flags);
-   } while ((src-version  1) || version != src-version);
+   } while (version != src-version);
 
if (unlikely((flags  PVCLOCK_GUEST_STOPPED) != 0)) {
src-flags = ~PVCLOCK_GUEST_STOPPED;
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index eb5d7a56f8d4..f09b09bcb515 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -117,7 +117,6 @@ static notrace cycle_t vread_pvclock(int *mode)
 */
cpu1 = __getcpu()  VGETCPU_CPU_MASK;
} while (unlikely(cpu != cpu1 ||
- (pvti-pvti.version  1) ||
  pvti-pvti.version != version));
 
if (unlikely(!(flags  PVCLOCK_TSC_STABLE_BIT)))

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-24 Thread Konrad Rzeszutek Wilk
On Thu, Jan 16, 2014 at 03:13:44PM +0100, Julian Stecklina wrote:
 The paravirtualized clock used in KVM and Xen uses a version field to
 allow the guest to see when the shared data structure is inconsistent.
 The code reads the version field twice (before and after the data
 structure is copied) and checks whether they haven't
 changed and that the lowest bit is not set. As the second access is not
 synchronized, the compiler could generate code that accesses version
 more than two times and you end up with inconsistent data.

Could you paste in the code that the 'bad' compiler generates
vs the compiler that generate 'good' code please?

 
 An example using pvclock_get_time_values:
 
 host starts updating data, sets src-version to 1
 guest reads src-version (1) and stores it into dst-version.
 guest copies inconsistent data
 guest reads src-version (1) and computes xor with dst-version.
 host finishes updating data and sets src-version to 2
 guest reads src-version (2) and checks whether lower bit is not set.
 while loop exits with inconsistent data!
 
 AFAICS the compiler is allowed to optimize the given code this way.
 
 Signed-off-by: Julian Stecklina jstec...@os.inf.tu-dresden.de
 ---
  arch/x86/kernel/pvclock.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
 index 42eb330..f62b41c 100644
 --- a/arch/x86/kernel/pvclock.c
 +++ b/arch/x86/kernel/pvclock.c
 @@ -55,6 +55,8 @@ static u64 pvclock_get_nsec_offset(struct 
 pvclock_shadow_time *shadow)
  static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
   struct pvclock_vcpu_time_info *src)
  {
 + u32 nversion;
 +
   do {
   dst-version = src-version;
   rmb();  /* fetch version before data */
 @@ -64,7 +66,8 @@ static unsigned pvclock_get_time_values(struct 
 pvclock_shadow_time *dst,
   dst-tsc_shift = src-tsc_shift;
   dst-flags = src-flags;
   rmb();  /* test version after fetching data */
 - } while ((src-version  1) || (dst-version != src-version));
 + nversion = ACCESS_ONCE(src-version);
 + } while ((nversion  1) || (dst-version != nversion));
  
   return dst-version;
  }
 @@ -135,7 +138,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
 *wall_clock,
   struct pvclock_vcpu_time_info *vcpu_time,
   struct timespec *ts)
  {
 - u32 version;
 + u32 version, nversion;
   u64 delta;
   struct timespec now;
  
 @@ -146,7 +149,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
 *wall_clock,
   now.tv_sec  = wall_clock-sec;
   now.tv_nsec = wall_clock-nsec;
   rmb();  /* fetch time before checking version */
 - } while ((wall_clock-version  1) || (version != wall_clock-version));
 + nversion = ACCESS_ONCE(wall_clock-version);
 + } while ((nversion  1) || (version != nversion));
  
   delta = pvclock_clocksource_read(vcpu_time);/* time since system 
 boot */
   delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
 -- 
 1.8.4.2
 
 
 ___
 Xen-devel mailing list
 xen-de...@lists.xen.org
 http://lists.xen.org/xen-devel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-17 Thread Jan Beulich
 On 16.01.14 at 17:04, Julian Stecklina jstec...@os.inf.tu-dresden.de 
 wrote:
 On 01/16/2014 04:04 PM, Jan Beulich wrote:
 I don't think so - this would only be an issue if the conditions used
 | instead of ||. || implies a sequence point between evaluating the
 left and right sides, and the standard says: The presence of a
 sequence point between the evaluation of expressions A and B
 implies that every value computation and side effect associated
 with A is sequenced before every value computation and side
 effect associated with B.
 
 This only applies to single-threaded code. Multithreaded code must be
 data-race free for that to be true. See
 
 https://lwn.net/Articles/508991/ 
 
 And even if there was a problem (i.e. my interpretation of the
 above being incorrect), I don't think you'd need ACCESS_ONCE()
 here: The same local variable can't have two different values in
 two different use sites when there was no intermediate
 assignment to it.
 
 Same comment as above.

One half of this doesn't apply here, due to the explicit barriers
that are there. The half about converting local variable accesses
back to memory reads (i.e. eliding the local variable), however,
is only a theoretical issue afaict: If a compiler really did this, I
think there'd be far more places where this would hurt.

Jan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-17 Thread Julian Stecklina
On 01/17/2014 10:41 AM, Jan Beulich wrote:
 The half about converting local variable accesses
 back to memory reads (i.e. eliding the local variable), however,
 is only a theoretical issue afaict: If a compiler really did this, I
 think there'd be far more places where this would hurt.

It happens rarely, but it does happen. Not fixing those issues is
inviting trouble with new compiler generations. And these issues are
terribly hard to debug.

Julian



signature.asc
Description: OpenPGP digital signature


Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-16 Thread Jan Beulich
 On 16.01.14 at 15:13, Julian Stecklina jstec...@os.inf.tu-dresden.de 
 wrote:
 The paravirtualized clock used in KVM and Xen uses a version field to
 allow the guest to see when the shared data structure is inconsistent.
 The code reads the version field twice (before and after the data
 structure is copied) and checks whether they haven't
 changed and that the lowest bit is not set. As the second access is not
 synchronized, the compiler could generate code that accesses version
 more than two times and you end up with inconsistent data.
 
 An example using pvclock_get_time_values:
 
 host starts updating data, sets src-version to 1
 guest reads src-version (1) and stores it into dst-version.
 guest copies inconsistent data
 guest reads src-version (1) and computes xor with dst-version.
 host finishes updating data and sets src-version to 2
 guest reads src-version (2) and checks whether lower bit is not set.
 while loop exits with inconsistent data!
 
 AFAICS the compiler is allowed to optimize the given code this way.

I don't think so - this would only be an issue if the conditions used
| instead of ||. || implies a sequence point between evaluating the
left and right sides, and the standard says: The presence of a
sequence point between the evaluation of expressions A and B
implies that every value computation and side effect associated
with A is sequenced before every value computation and side
effect associated with B.

And even if there was a problem (i.e. my interpretation of the
above being incorrect), I don't think you'd need ACCESS_ONCE()
here: The same local variable can't have two different values in
two different use sites when there was no intermediate
assignment to it.

Interestingly the old XenoLinux code uses | instead of || in
both cases, yet only one of the two also entertains the use
of a local variable. I shall fix this (read: Thanks for pointing
out even if in the upstream incarnation this is not a problem).

Jan

 Signed-off-by: Julian Stecklina jstec...@os.inf.tu-dresden.de
 ---
  arch/x86/kernel/pvclock.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
 index 42eb330..f62b41c 100644
 --- a/arch/x86/kernel/pvclock.c
 +++ b/arch/x86/kernel/pvclock.c
 @@ -55,6 +55,8 @@ static u64 pvclock_get_nsec_offset(struct 
 pvclock_shadow_time *shadow)
  static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
   struct pvclock_vcpu_time_info *src)
  {
 + u32 nversion;
 +
   do {
   dst-version = src-version;
   rmb();  /* fetch version before data */
 @@ -64,7 +66,8 @@ static unsigned pvclock_get_time_values(struct 
 pvclock_shadow_time *dst,
   dst-tsc_shift = src-tsc_shift;
   dst-flags = src-flags;
   rmb();  /* test version after fetching data */
 - } while ((src-version  1) || (dst-version != src-version));
 + nversion = ACCESS_ONCE(src-version);
 + } while ((nversion  1) || (dst-version != nversion));
  
   return dst-version;
  }
 @@ -135,7 +138,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
 *wall_clock,
   struct pvclock_vcpu_time_info *vcpu_time,
   struct timespec *ts)
  {
 - u32 version;
 + u32 version, nversion;
   u64 delta;
   struct timespec now;
  
 @@ -146,7 +149,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
 *wall_clock,
   now.tv_sec  = wall_clock-sec;
   now.tv_nsec = wall_clock-nsec;
   rmb();  /* fetch time before checking version */
 - } while ((wall_clock-version  1) || (version != wall_clock-version));
 + nversion = ACCESS_ONCE(wall_clock-version);
 + } while ((nversion  1) || (version != nversion));
  
   delta = pvclock_clocksource_read(vcpu_time);/* time since system 
 boot */
   delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
 -- 
 1.8.4.2
 
 
 ___
 Xen-devel mailing list
 xen-de...@lists.xen.org 
 http://lists.xen.org/xen-devel 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-16 Thread Julian Stecklina
On 01/16/2014 04:04 PM, Jan Beulich wrote:
 I don't think so - this would only be an issue if the conditions used
 | instead of ||. || implies a sequence point between evaluating the
 left and right sides, and the standard says: The presence of a
 sequence point between the evaluation of expressions A and B
 implies that every value computation and side effect associated
 with A is sequenced before every value computation and side
 effect associated with B.

This only applies to single-threaded code. Multithreaded code must be
data-race free for that to be true. See

https://lwn.net/Articles/508991/

 And even if there was a problem (i.e. my interpretation of the
 above being incorrect), I don't think you'd need ACCESS_ONCE()
 here: The same local variable can't have two different values in
 two different use sites when there was no intermediate
 assignment to it.

Same comment as above.

Julian



signature.asc
Description: OpenPGP digital signature