Am 08.05.2014 16:52, schrieb ronnie sahlberg: > On Thu, May 8, 2014 at 4:33 AM, Stefan Hajnoczi <stefa...@redhat.com> wrote: >> On Wed, May 07, 2014 at 04:09:27PM +0200, Peter Lieven wrote: >>> On 07.05.2014 12:29, Paolo Bonzini wrote: >>>> Il 07/05/2014 12:07, Stefan Hajnoczi ha scritto: >>>>> On Fri, May 02, 2014 at 12:39:06AM +0200, Peter Lieven wrote: >>>>>>> +static void iscsi_attach_aio_context(BlockDriverState *bs, >>>>>>> + AioContext *new_context) >>>>>>> +{ >>>>>>> + IscsiLun *iscsilun = bs->opaque; >>>>>>> + >>>>>>> + iscsilun->aio_context = new_context; >>>>>>> + iscsi_set_events(iscsilun); >>>>>>> + >>>>>>> +#if defined(LIBISCSI_FEATURE_NOP_COUNTER) >>>>>>> + /* Set up a timer for sending out iSCSI NOPs */ >>>>>>> + iscsilun->nop_timer = aio_timer_new(iscsilun->aio_context, >>>>>>> + QEMU_CLOCK_REALTIME, SCALE_MS, >>>>>>> + iscsi_nop_timed_event, iscsilun); >>>>>>> + timer_mod(iscsilun->nop_timer, >>>>>>> + qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL); >>>>>>> +#endif >>>>>>> +} >>>>>> Is it still guaranteed that iscsi_nop_timed_event for a target is not >>>>>> invoked >>>>>> while we are in another function/callback of the iscsi driver for the >>>>>> same target? >>>> Yes, since the timer is in the same AioContext as the iscsi driver >>>> callbacks. >>> >>> Ok. Stefan: What MUST NOT happen is that the timer gets fired while we are >>> in iscsi_service. >>> As Paolo outlined, this cannot happen, right? >> Okay, I think we're safe then. The timer can only be invoked during >> aio_poll() event loop iterations. It cannot be invoked while we're >> inside iscsi_service(). >> >>>>> BTW, is iscsi_reconnect() the right libiscsi interface to use since it >>>>> is synchronous? It seems like this would block QEMU until the socket >>>>> has connected! The guest would be frozen. >>>> There is no asynchronous interface yet for reconnection, unfortunately. >>> We initiate the reconnect after we miss a few NOP replies. So the target is >>> already down for approx. 30 seconds. >>> Every process inside the guest is already haging or has timed out. >>> >>> If I understand correctly with the new patches only the communication with >>> this target is hanging or isn't it? >>> So what benefit would an asyncronous reconnect have? >> Asynchronous reconnect is desirable: >> >> 1. The QEMU monitor is blocked while we're waiting for the iSCSI target >> to accept our reconnect. This means the management stack (libvirt) >> cannot control QEMU until we time out or succeed. >> >> 2. The guest is totally frozen - cannot execute instructions - because >> it will soon reach a point in the code that locks the QEMU global >> mutex (which is being held while we reconnect to the iSCSI target). >> >> This may be okayish for guests where the iSCSI LUN contains the >> "main" data that is being processed. But what if an iSCSI LUN was >> just attached to a guest that is also doing other things that are >> independent (e.g. serving a website, processing data from a local >> disk, etc) - now the reconnect causes downtime for the entire guest. > I will look into making the reconnect async over the next few days.
Thanks for looking into this. I have a few things in mind that I will post on github to the issue you created. Peter