On Thu, Mar 11, 2021 at 09:01:51AM +0000, Paul Durrant wrote: > On 10/03/2021 14:58, Jason Andryuk wrote: > > Hi, > > > > I was running a loop of `xl block-attach ; xl block-detach` and I > > triggered a BUG in xen-blkfront, drivers/block/xen-blkfront.c:1917 > > This is BUG_ON(info->nr_rings) in negotiate_mq called by blkback_changed. > > > > I'm using Linux 5.4.103 and blktap3 on Xen 4.12 (OpenXT), though I > > don't think that matters. The backtrace and some preceding logs (from > > the reproducer) are below. > > > > I just repro-ed with this: > > path=<backend path/state> > > xenstore-write $path 5 ; xenstore-write $path 4 > > > > info->nr_rings is still set because of the unexpected transition > > XenbusStateClosing -> XenbusStateConnected: > > dom7: [ 2866.574853] vbd vbd-51728: blkfront:blkback_changed to state 5. > > dom7: [ 2866.578385] vbd vbd-51728: blkfront:blkback_changed to state 4. > > > > I'm not totally sure how to handle this. The XenbusStateConnected > > event should be creating a new blkfront device, but instead it's seen > > by the old one which hasn't been cleaned up yet.
IIRC xenbus state changes (like you perform above) never trigger the creation or destruction of devices on the bus. See xenbus_otherend_changed. xl block-detach however should indeed remove the device. We should add an option to `xl block-detach -w` to wait for the device to actually be removed before returning (or exit with a timeout). > > > > Sounds like blkfront needs to be fixed. Once it is in state 5 the only state > it should go to should be 6. From there it can cycle back to 4. Indeed, there's likely some logic to be improved in blkfront so it doesn't get messed up so badly on state changes by blkback. I'm happy to review patch for both blkfront and libxl/xl in order to make this better :). Thanks, Roger.