attach_aio_context()

Peter Lieven Thu, 08 May 2014 08:47:16 -0700

Am 08.05.2014 16:52, schrieb ronnie sahlberg:
> On Thu, May 8, 2014 at 4:33 AM, Stefan Hajnoczi <stefa...@redhat.com> wrote:
>> On Wed, May 07, 2014 at 04:09:27PM +0200, Peter Lieven wrote:
>>> On 07.05.2014 12:29, Paolo Bonzini wrote:
>>>> Il 07/05/2014 12:07, Stefan Hajnoczi ha scritto:
>>>>> On Fri, May 02, 2014 at 12:39:06AM +0200, Peter Lieven wrote:
>>>>>>> +static void iscsi_attach_aio_context(BlockDriverState *bs,
>>>>>>> +                                     AioContext *new_context)
>>>>>>> +{
>>>>>>> +    IscsiLun *iscsilun = bs->opaque;
>>>>>>> +
>>>>>>> +    iscsilun->aio_context = new_context;
>>>>>>> +    iscsi_set_events(iscsilun);
>>>>>>> +
>>>>>>> +#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
>>>>>>> +    /* Set up a timer for sending out iSCSI NOPs */
>>>>>>> +    iscsilun->nop_timer = aio_timer_new(iscsilun->aio_context,
>>>>>>> + QEMU_CLOCK_REALTIME, SCALE_MS,
>>>>>>> + iscsi_nop_timed_event, iscsilun);
>>>>>>> +    timer_mod(iscsilun->nop_timer,
>>>>>>> +              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
>>>>>>> +#endif
>>>>>>> +}
>>>>>> Is it still guaranteed that iscsi_nop_timed_event for a target is not 
>>>>>> invoked
>>>>>> while we are in another function/callback of the iscsi driver for the 
>>>>>> same target?
>>>> Yes, since the timer is in the same AioContext as the iscsi driver 
>>>> callbacks.
>>>
>>> Ok. Stefan: What MUST NOT happen is that the timer gets fired while we are 
>>> in iscsi_service.
>>> As Paolo outlined, this cannot happen, right?
>> Okay, I think we're safe then.  The timer can only be invoked during
>> aio_poll() event loop iterations.  It cannot be invoked while we're
>> inside iscsi_service().
>>
>>>>> BTW, is iscsi_reconnect() the right libiscsi interface to use since it
>>>>> is synchronous?  It seems like this would block QEMU until the socket
>>>>> has connected!  The guest would be frozen.
>>>> There is no asynchronous interface yet for reconnection, unfortunately.
>>> We initiate the reconnect after we miss a few NOP replies. So the target is 
>>> already down for approx. 30 seconds.
>>> Every process inside the guest is already haging or has timed out.
>>>
>>> If I understand correctly with the new patches only the communication with 
>>> this target is hanging or isn't it?
>>> So what benefit would an asyncronous reconnect have?
>> Asynchronous reconnect is desirable:
>>
>> 1. The QEMU monitor is blocked while we're waiting for the iSCSI target
>>    to accept our reconnect.  This means the management stack (libvirt)
>>    cannot control QEMU until we time out or succeed.
>>
>> 2. The guest is totally frozen - cannot execute instructions - because
>>    it will soon reach a point in the code that locks the QEMU global
>>    mutex (which is being held while we reconnect to the iSCSI target).
>>
>>    This may be okayish for guests where the iSCSI LUN contains the
>>    "main" data that is being processed.  But what if an iSCSI LUN was
>>    just attached to a guest that is also doing other things that are
>>    independent (e.g. serving a website, processing data from a local
>>    disk, etc) - now the reconnect causes downtime for the entire guest.
> I will look into making the reconnect async over the next few days.


Thanks for looking into this. I have a few things in mind that I will
post on github to the issue you created.

Peter

Re: [Qemu-devel] [PATCH 08/22] iscsi: implement .bdrv_detach/attach_aio_context()

Reply via email to