Re: [libvirt] RFC: libxl race fixes

2013-01-18 Thread Daniel P. Berrange
On Thu, Jan 17, 2013 at 10:29:35PM -0700, Jim Fehlig wrote:
 Jim Fehlig wrote:
  I've been investigating some races in the libxl driver and would like to
  get comments on some potential solutions.
 
  The first race is in the fd/timeout event handling code, which maps
  libxl's osevent interface to libvirt's event loop interface.  This
  mapping opens the possibility for libvirt's event loop to invoke
  event callbacks after libxl has already deregistered the event,
  potentially accessing an event object that has already been freed.
 
  One solution to this race I've found successful is reference counting the
  objects associated with the events.  When libxl registers an event, an
  object encapsulating the event is created and it's reference count is set
  to 1.  When the event is injected into libvirt's event loop, another
  reference is taken on the object.  When libxl deregisters the event, it's
  reference count is decremented.  Once the event is removed from libvirt's
  event loop, the final reference is decremented and the object is disposed.

 
 While rebasing this solution to use danpb's recent virObjectLockable
 change, I revisited using only a lock in the libxlDomainObject,
 acquiring the lock in the registration, deregistration, modify, and
 callback functions.  I recall some problems with this approach before,
 but now find it to be sufficient.  Perhaps a misplaced lock drove me to
 the unneeded complexity of this double lock nonsense...
 
  This approach ensures the object is not disposed until it is removed
  from libvirt's event loop *and* libxl had explicitly deregistered the event.
  The notion of an event being 'disabled' found in libvirt's event loop impl
  also had to be added to the libxl event object, to ensure the driver doesn't
  call into libxl for a previously deregistered event.
 
  The second race is between destroying a vm (i.e., calling
  privateDataFreeFunc, which frees the libxl ctx) and deregistration/cleanup
  of all events that have been registered by libxl.
 
  One solution for this race is to convert libxlDomainObjPrivate to a
  virObject and increment its reference for each fd and timeout registration.

 
 Only change here is using the new virObjectLockable.

I'm surprised that you need to have a separate lock for libxlDomainObjPrivate.
The lifetime of that object is tied to the lifetime of the virDomainObjPtr,
which already has a lock. Isn't it sufficient to do locking on the latter,
whenever the libxlDomainObjPrivate need protecting ?


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: libxl race fixes

2013-01-18 Thread Jim Fehlig
Daniel P. Berrange wrote:
 On Thu, Jan 17, 2013 at 10:29:35PM -0700, Jim Fehlig wrote:
   
 Jim Fehlig wrote:
 
 I've been investigating some races in the libxl driver and would like to
 get comments on some potential solutions.

 The first race is in the fd/timeout event handling code, which maps
 libxl's osevent interface to libvirt's event loop interface.  This
 mapping opens the possibility for libvirt's event loop to invoke
 event callbacks after libxl has already deregistered the event,
 potentially accessing an event object that has already been freed.

 One solution to this race I've found successful is reference counting the
 objects associated with the events.  When libxl registers an event, an
 object encapsulating the event is created and it's reference count is set
 to 1.  When the event is injected into libvirt's event loop, another
 reference is taken on the object.  When libxl deregisters the event, it's
 reference count is decremented.  Once the event is removed from libvirt's
 event loop, the final reference is decremented and the object is disposed.
   
   
 While rebasing this solution to use danpb's recent virObjectLockable
 change, I revisited using only a lock in the libxlDomainObject,
 acquiring the lock in the registration, deregistration, modify, and
 callback functions.  I recall some problems with this approach before,
 but now find it to be sufficient.  Perhaps a misplaced lock drove me to
 the unneeded complexity of this double lock nonsense...

 
 This approach ensures the object is not disposed until it is removed
 from libvirt's event loop *and* libxl had explicitly deregistered the event.
 The notion of an event being 'disabled' found in libvirt's event loop impl
 also had to be added to the libxl event object, to ensure the driver doesn't
 call into libxl for a previously deregistered event.

 The second race is between destroying a vm (i.e., calling
 privateDataFreeFunc, which frees the libxl ctx) and deregistration/cleanup
 of all events that have been registered by libxl.

 One solution for this race is to convert libxlDomainObjPrivate to a
 virObject and increment its reference for each fd and timeout registration.
   
   
 Only change here is using the new virObjectLockable.
 

 I'm surprised that you need to have a separate lock for libxlDomainObjPrivate.
 The lifetime of that object is tied to the lifetime of the virDomainObjPtr,
 which already has a lock. Isn't it sufficient to do locking on the latter,
 whenever the libxlDomainObjPrivate need protecting ?
   

Yeah, I thought it would be sufficient too, but fd/timeout events can
happen during vm operations where the virDomainObj is locked.  E.g.
during vm creation, the virDomainObj is locked but fd/timeout events
occur as libxl reads/writes to xenstore, sets watches, reads a
kernel/initrd from host, etc.  It seems cleaner to have a lock in the
libxlDomainObjPrivate for accessing libxl's event registrations.

Regards,
Jim

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: libxl race fixes

2013-01-17 Thread Jim Fehlig
Jim Fehlig wrote:
 I've been investigating some races in the libxl driver and would like to
 get comments on some potential solutions.

 The first race is in the fd/timeout event handling code, which maps
 libxl's osevent interface to libvirt's event loop interface.  This
 mapping opens the possibility for libvirt's event loop to invoke
 event callbacks after libxl has already deregistered the event,
 potentially accessing an event object that has already been freed.

 One solution to this race I've found successful is reference counting the
 objects associated with the events.  When libxl registers an event, an
 object encapsulating the event is created and it's reference count is set
 to 1.  When the event is injected into libvirt's event loop, another
 reference is taken on the object.  When libxl deregisters the event, it's
 reference count is decremented.  Once the event is removed from libvirt's
 event loop, the final reference is decremented and the object is disposed.
   

While rebasing this solution to use danpb's recent virObjectLockable
change, I revisited using only a lock in the libxlDomainObject,
acquiring the lock in the registration, deregistration, modify, and
callback functions.  I recall some problems with this approach before,
but now find it to be sufficient.  Perhaps a misplaced lock drove me to
the unneeded complexity of this double lock nonsense...

 This approach ensures the object is not disposed until it is removed
 from libvirt's event loop *and* libxl had explicitly deregistered the event.
 The notion of an event being 'disabled' found in libvirt's event loop impl
 also had to be added to the libxl event object, to ensure the driver doesn't
 call into libxl for a previously deregistered event.

 The second race is between destroying a vm (i.e., calling
 privateDataFreeFunc, which frees the libxl ctx) and deregistration/cleanup
 of all events that have been registered by libxl.

 One solution for this race is to convert libxlDomainObjPrivate to a
 virObject and increment its reference for each fd and timeout registration.
   

Only change here is using the new virObjectLockable.

 Only when all fds and timeouts are deregistered and destroyed will the
 libxlDomainObjPrivate be destroyed.  One downside to this approach is that
 an API to cleanup the libxl ctx is needed.  Without such an API, some
 fds are not deregistered until calling libxl_ctx_free.  But since the fd
 events have references on the libxlDomainObjPrivate, libxl_ctx_free is
 never called.  (BTW I have a patch adding libxl_ctx_quiesce() to libxl,
 which upstream xen folks seem receptive to, including backporting to
 Xen 4.2 branch, but I don't think this is ideal.)

 An alternate solution that can be used to address both of these races
 is to maintain a list of the fd/timeout registrations in the
 libxlDomainObjPrivate object, and take a more brute force approach to
 managing the registrations
   

Turns out only timeouts need special attention.  Their deregistration is
asynchronous wrt freeing the libxl ctx.  Keeping a list of active
timeouts and explicitly cleaning them up prior to freeing the libxl ctx
works well in my testing, and requires no changes to libxl.  That said,
all of my testing has included two libxl patches from Ian Jackson [1],
which will be needed in conjunction with the driver fixes to have a
stable libxl stack.

I'll post my libxl driver fixes after testing their behavior without
Ian's patches.

Regards,
Jim

[1] http://lists.xen.org/archives/html/xen-devel/2012-12/msg00684.html
http://lists.xen.org/archives/html/xen-devel/2012-12/msg00685.html

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] RFC: libxl race fixes

2013-01-16 Thread Jim Fehlig
I've been investigating some races in the libxl driver and would like to
get comments on some potential solutions.

The first race is in the fd/timeout event handling code, which maps
libxl's osevent interface to libvirt's event loop interface.  This
mapping opens the possibility for libvirt's event loop to invoke
event callbacks after libxl has already deregistered the event,
potentially accessing an event object that has already been freed.

One solution to this race I've found successful is reference counting the
objects associated with the events.  When libxl registers an event, an
object encapsulating the event is created and it's reference count is set
to 1.  When the event is injected into libvirt's event loop, another
reference is taken on the object.  When libxl deregisters the event, it's
reference count is decremented.  Once the event is removed from libvirt's
event loop, the final reference is decremented and the object is disposed.
This approach ensures the object is not disposed until it is removed
from libvirt's event loop *and* libxl had explicitly deregistered the event.
The notion of an event being 'disabled' found in libvirt's event loop impl
also had to be added to the libxl event object, to ensure the driver doesn't
call into libxl for a previously deregistered event.

The second race is between destroying a vm (i.e., calling
privateDataFreeFunc, which frees the libxl ctx) and deregistration/cleanup
of all events that have been registered by libxl.

One solution for this race is to convert libxlDomainObjPrivate to a
virObject and increment its reference for each fd and timeout registration.
Only when all fds and timeouts are deregistered and destroyed will the
libxlDomainObjPrivate be destroyed.  One downside to this approach is that
an API to cleanup the libxl ctx is needed.  Without such an API, some
fds are not deregistered until calling libxl_ctx_free.  But since the fd
events have references on the libxlDomainObjPrivate, libxl_ctx_free is
never called.  (BTW I have a patch adding libxl_ctx_quiesce() to libxl,
which upstream xen folks seem receptive to, including backporting to
Xen 4.2 branch, but I don't think this is ideal.)

An alternate solution that can be used to address both of these races
is to maintain a list of the fd/timeout registrations in the
libxlDomainObjPrivate object, and take a more brute force approach to
managing the registrations

Other solutions that I might be overlooking are certainly appreciated :).

Regards,
Jim

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list