Am 06.03.2018 um 17:35 schrieb Peter Lieven:
> Am 06.03.2018 um 17:07 schrieb Stefan Hajnoczi:
>> On Mon, Mar 05, 2018 at 02:52:16PM +0000, Dr. David Alan Gilbert wrote:
>>> * Peter Lieven ( wrote:
>>>> Am 05.03.2018 um 12:45 schrieb Stefan Hajnoczi:
>>>>> On Thu, Feb 22, 2018 at 12:13:50PM +0100, Peter Lieven wrote:
>>>>>> I stumbled across the MAX_INFLIGHT_IO field that was introduced in 2015 
>>>>>> and was curious what was the reason
>>>>>> to choose 512MB as readahead? The question is that I found that the 
>>>>>> source VM gets very unresponsive I/O wise
>>>>>> while the initial 512MB are read and furthermore seems to stay 
>>>>>> unreasponsive if we choose a high migration speed
>>>>>> and have a fast storage on the destination VM.
>>>>>> In our environment I modified this value to 16MB which seems to work 
>>>>>> much smoother. I wonder if we should make
>>>>>> this a user configurable value or define a different rate limit for the 
>>>>>> block transfer in bulk stage at least?
>>>>> I don't know if benchmarks were run when choosing the value.  From the
>>>>> commit description it sounds like the main purpose was to limit the
>>>>> amount of memory that can be consumed.
>>>>> 16 MB also fulfills that criteria :), but why is the source VM more
>>>>> responsive with a lower value?
>>>>> Perhaps the issue is queue depth on the storage device - the block
>>>>> migration code enqueues up to 512 MB worth of reads, and guest I/O has
>>>>> to wait?
>>>> That is my guess. Especially if the destination storage is faster we 
>>>> basically alsways have
>>>> 512 I/Os in flight on the source storage.
>>>> Does anyone mind if the reduce that value to 16MB or do we need a better 
>>>> mechanism?
>>> We've got migration-parameters these days; you could connect it to one
>>> of those fairly easily I think.
>>> Try: grep -i 'cpu[-_]throttle[-_]initial'  for an example of one that's
>>> already there.
>>> Then you can set it to whatever you like.
>> It would be nice to solve the performance problem without adding a
>> tuneable.
>> On the other hand, QEMU has no idea what the queue depth of the device
>> is.  Therefore it cannot prioritize guest I/O over block migration I/O.
>> 512 parallel requests is much too high.  Most parallel I/O benchmarking
>> is done at 32-64 queue depth.
>> I think that 16 parallel requests is a reasonable maximum number for a
>> background job.
>> We need to be clear though that the purpose of this change is unrelated
>> to the original 512 MB memory footprint goal.  It just happens to touch
>> the same constant but the goal is now to submit at most 16 I/O requests
>> in parallel to avoid monopolizing the I/O device.
> I think we should really look at this. The variables that control if we stay 
> in the while loop or not are incremented and decremented
> at the following places:
> mig_save_device_dirty:
> mig_save_device_bulk:
>     block_mig_state.submitted++;
> blk_mig_read_cb:
>     block_mig_state.submitted--;
>     block_mig_state.read_done++;
> flush_blks:
>     block_mig_state.read_done--;
> The condition of the while loop is:
> (block_mig_state.submitted +
>             block_mig_state.read_done) * BLOCK_SIZE <
>            qemu_file_get_rate_limit(f) &&
>            (block_mig_state.submitted +
>             block_mig_state.read_done) <
>            MAX_INFLIGHT_IO)
> At first I wonder if we ever reach the rate-limit because we put the read 
> buffers onto f AFTER we exit the while loop?
> And even if we reach the limit we constantly maintain 512 I/Os in parallel 
> because we immediately decrement read_done
> when we put the buffers to f in flush_blks. In the next iteration of the 
> while loop we then read again until we have 512 in-flight I/Os.
> And shouldn't we have a time limit to limit the time we stay in the while 
> loop? I think we artificially delay sending data to f?

Thinking about it for a while I would propose the following:

b) add MAX_PARALLEL_IO with a value of 16
c) compare qemu_file_get_rate_limit only with block_mig_state.read_done

This would yield in the following condition for the while loop:

(block_mig_state.read_done * BLOCK_SIZE < qemu_file_get_rate_limit(f) &&
 (block_mig_state.submitted + block_mig_state.read_done) < MAX_IO_BUFFERS &&
 block_mig_state.submitted < MAX_PARALLEL_IO)

Sounds that like a plan?


Reply via email to