On Thu 2014-11-20 19:00:16, Michael S. Tsirkin wrote:
> On Thu, Nov 20, 2014 at 05:55:58PM +0100, Petr Mladek wrote:
> > On Thu 2014-11-20 11:29:35, Tejun Heo wrote:
> > > On Thu, Nov 20, 2014 at 06:26:24PM +0200, Michael S. Tsirkin wrote:
> > > > On Thu, Nov 20, 2014 at 06:25:43PM +0200, Michael S. Tsirkin wrote:
> > > > > On Thu, Nov 20, 2014 at 11:07:46AM -0500, Tejun Heo wrote:
> > > > > > On Thu, Nov 20, 2014 at 05:03:17PM +0100, Petr Mladek wrote:
> > > > > > ...
> > > > > > > @@ -476,7 +460,6 @@ static void virtballoon_remove(struct 
> > > > > > > virtio_device *vdev)
> > > > > > >  {
> > > > > > >   struct virtio_balloon *vb = vdev->priv;
> > > > > > >  
> > > > > > > - kthread_stop(vb->thread);
> > > > > > >   remove_common(vb);
> > > > > > >   kfree(vb);
> > > > > > >  }
> > > > > > 
> > > > > > Shouldn't the work item be flushed before removal is complete?
> > 
> > Great catch!
> > 
> > > > > In fact, flushing it won't help because it can requeue itself, right?
> > > 
> > > There's cancel_work_sync() to stop the self-requeueing ones.
> > 
> > Ah, one more problem is that remove_common(vb) calls leak_balloon()
> > that queues the work if not finished. We would need to add some flag
> > or variant that would disable the queuing when called here.
> > 
> 
> That's why Tejun suggested cancel_work_sync, IIUC it stops
> the requeuing without need for extra flags.

But he also wrote that it handles only self-queuing. The queuing from
external locations need to be prevented other ways.

> > > > From that POV a dedicated WQ kept it simple.
> > > 
> > > A dedicated wq doesn't do anything for that.  You can't shut down a
> > > workqueue with a pending work item on it.  destroy_workqueue() will
> > > try to drain the target wq, warn if it doesn't finish in certain
> > > number of iterations and just keep trying indefinitely.
> > 
> > I wonder if it is guaranteed that none would trigger
> > stats_request() or virtballoon_changed() when virtballoon_remove() is
> > being called. I guess so because the original code would fail
> > otherwise. The two functions access "vb->config_change"
> > and the structure is freed in virtballoon_remove() without
> > any protection.
> > 
> > I am trying to confirm this by reading the code but it is not that
> > easy.
> > 
> > Best Regards,
> > Petr
> 
> It's synchronized through hardware.  remove_common calls reset and
> del_vqs which will prevent new interrupts.

I see, it means that stats_request() or virtballoon_changed() can be
called until vb->vdev->config->reset(vb->vdev); is called in
remove_common().

It means that fill_balloon() can be queued and proceed after we leak
all pages and before we reset the devices in remove_common(). I have
to think about a way how to avoid this. Maybe add some flag into
struct virtio_balloon that would signalize that the balloon is being
removed and new operations should not longer be queued. But there
might be a more elegant solution.

Best Regards,
Petr
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to