On 08/08/2017 14:53, Stefan Hajnoczi wrote: > On Fri, Aug 04, 2017 at 01:46:17PM +0200, Paolo Bonzini wrote: >> On 04/08/2017 11:58, Stefan Hajnoczi wrote: >>>> the root cause of this bug is related to this as well: >>>> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html >>>> >>>> From commit 99723548 we started assuming (incorrectly?) that blk_ >>>> functions always WILL have an attached BDS, but this is not always true, >>>> for instance, flushing the cache from an empty CDROM. >>>> >>>> Paolo, can we move the flight counter increment outside of the >>>> block-backend layer, is that safe? >>> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed >>> regardless of the throttling timer issue discussed below. BB cannot >>> assume that the BDS graph is non-empty. >> >> Can we make bdrv_aio_* return NULL (even temporarily) if there is no >> attached BDS? That would make it much easier to fix. > > There are many blk_aio_*() callers. Returning NULL forces them to > perform extra error handling.
Most of them either are for non-removable devices and check for a non-empty BB at startup. The others (ide-cd and scsi-cd, sd) check the BlockBackend's blk_is_available or blk_is_inserted state themselves. It does require some auditing of course, remember that anything that would return NULL, would now be crashing already in bdrv_inc_in_flight. We'd be seeing NULL pointer dereferences left and right, if it were so bad. > When you say "temporarily" do you mean it returns NULL but schedules a > one-shot BH to invoke the callback? I wonder if we can use a singleton > aiocb instead of NULL for -ENOMEDIUM errors. No, I mean undoing that in 2.11. Paolo
signature.asc
Description: OpenPGP digital signature