Re: [Qemu-block] block migration and MAX_IN_FLIGHT_IO
Am 07.03.2018 um 10:47 schrieb Stefan Hajnoczi: > On Wed, Mar 7, 2018 at 7:55 AM, Peter Lieven wrote: >> Am 06.03.2018 um 17:35 schrieb Peter Lieven: >>> Am 06.03.2018 um 17:07 schrieb Stefan Hajnoczi: On Mon, Mar 05, 2018 at 02:52:16PM +, Dr. David Alan Gilbert wrote: > * Peter Lieven (p...@kamp.de) wrote: >> Am 05.03.2018 um 12:45 schrieb Stefan Hajnoczi: >>> On Thu, Feb 22, 2018 at 12:13:50PM +0100, Peter Lieven wrote: I stumbled across the MAX_INFLIGHT_IO field that was introduced in 2015 and was curious what was the reason to choose 512MB as readahead? The question is that I found that the source VM gets very unresponsive I/O wise while the initial 512MB are read and furthermore seems to stay unreasponsive if we choose a high migration speed and have a fast storage on the destination VM. In our environment I modified this value to 16MB which seems to work much smoother. I wonder if we should make this a user configurable value or define a different rate limit for the block transfer in bulk stage at least? >>> I don't know if benchmarks were run when choosing the value. From the >>> commit description it sounds like the main purpose was to limit the >>> amount of memory that can be consumed. >>> >>> 16 MB also fulfills that criteria :), but why is the source VM more >>> responsive with a lower value? >>> >>> Perhaps the issue is queue depth on the storage device - the block >>> migration code enqueues up to 512 MB worth of reads, and guest I/O has >>> to wait? >> That is my guess. Especially if the destination storage is faster we >> basically alsways have >> 512 I/Os in flight on the source storage. >> >> Does anyone mind if the reduce that value to 16MB or do we need a better >> mechanism? > We've got migration-parameters these days; you could connect it to one > of those fairly easily I think. > Try: grep -i 'cpu[-_]throttle[-_]initial' for an example of one that's > already there. > Then you can set it to whatever you like. It would be nice to solve the performance problem without adding a tuneable. On the other hand, QEMU has no idea what the queue depth of the device is. Therefore it cannot prioritize guest I/O over block migration I/O. 512 parallel requests is much too high. Most parallel I/O benchmarking is done at 32-64 queue depth. I think that 16 parallel requests is a reasonable maximum number for a background job. We need to be clear though that the purpose of this change is unrelated to the original 512 MB memory footprint goal. It just happens to touch the same constant but the goal is now to submit at most 16 I/O requests in parallel to avoid monopolizing the I/O device. >>> I think we should really look at this. The variables that control if we >>> stay in the while loop or not are incremented and decremented >>> at the following places: >>> >>> mig_save_device_dirty: >>> mig_save_device_bulk: >>> block_mig_state.submitted++; >>> >>> blk_mig_read_cb: >>> block_mig_state.submitted--; >>> block_mig_state.read_done++; >>> >>> flush_blks: >>> block_mig_state.read_done--; >>> >>> The condition of the while loop is: >>> (block_mig_state.submitted + >>> block_mig_state.read_done) * BLOCK_SIZE < >>>qemu_file_get_rate_limit(f) && >>>(block_mig_state.submitted + >>> block_mig_state.read_done) < >>>MAX_INFLIGHT_IO) >>> >>> At first I wonder if we ever reach the rate-limit because we put the read >>> buffers onto f AFTER we exit the while loop? >>> >>> And even if we reach the limit we constantly maintain 512 I/Os in parallel >>> because we immediately decrement read_done >>> when we put the buffers to f in flush_blks. In the next iteration of the >>> while loop we then read again until we have 512 in-flight I/Os. >>> >>> And shouldn't we have a time limit to limit the time we stay in the while >>> loop? I think we artificially delay sending data to f? >> Thinking about it for a while I would propose the following: >> >> a) rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS >> b) add MAX_PARALLEL_IO with a value of 16 >> c) compare qemu_file_get_rate_limit only with block_mig_state.read_done >> >> This would yield in the following condition for the while loop: >> >> (block_mig_state.read_done * BLOCK_SIZE < qemu_file_get_rate_limit(f) && >> (block_mig_state.submitted + block_mig_state.read_done) < MAX_IO_BUFFERS && >> block_mig_state.submitted < MAX_PARALLEL_IO) >> >> Sounds that like a plan? > That sounds good to me. I will prepare patches for this. Peter
Re: [Qemu-block] block migration and MAX_IN_FLIGHT_IO
On Wed, Mar 7, 2018 at 7:55 AM, Peter Lieven wrote: > Am 06.03.2018 um 17:35 schrieb Peter Lieven: >> Am 06.03.2018 um 17:07 schrieb Stefan Hajnoczi: >>> On Mon, Mar 05, 2018 at 02:52:16PM +, Dr. David Alan Gilbert wrote: * Peter Lieven (p...@kamp.de) wrote: > Am 05.03.2018 um 12:45 schrieb Stefan Hajnoczi: >> On Thu, Feb 22, 2018 at 12:13:50PM +0100, Peter Lieven wrote: >>> I stumbled across the MAX_INFLIGHT_IO field that was introduced in 2015 >>> and was curious what was the reason >>> to choose 512MB as readahead? The question is that I found that the >>> source VM gets very unresponsive I/O wise >>> while the initial 512MB are read and furthermore seems to stay >>> unreasponsive if we choose a high migration speed >>> and have a fast storage on the destination VM. >>> >>> In our environment I modified this value to 16MB which seems to work >>> much smoother. I wonder if we should make >>> this a user configurable value or define a different rate limit for the >>> block transfer in bulk stage at least? >> I don't know if benchmarks were run when choosing the value. From the >> commit description it sounds like the main purpose was to limit the >> amount of memory that can be consumed. >> >> 16 MB also fulfills that criteria :), but why is the source VM more >> responsive with a lower value? >> >> Perhaps the issue is queue depth on the storage device - the block >> migration code enqueues up to 512 MB worth of reads, and guest I/O has >> to wait? > That is my guess. Especially if the destination storage is faster we > basically alsways have > 512 I/Os in flight on the source storage. > > Does anyone mind if the reduce that value to 16MB or do we need a better > mechanism? We've got migration-parameters these days; you could connect it to one of those fairly easily I think. Try: grep -i 'cpu[-_]throttle[-_]initial' for an example of one that's already there. Then you can set it to whatever you like. >>> It would be nice to solve the performance problem without adding a >>> tuneable. >>> >>> On the other hand, QEMU has no idea what the queue depth of the device >>> is. Therefore it cannot prioritize guest I/O over block migration I/O. >>> >>> 512 parallel requests is much too high. Most parallel I/O benchmarking >>> is done at 32-64 queue depth. >>> >>> I think that 16 parallel requests is a reasonable maximum number for a >>> background job. >>> >>> We need to be clear though that the purpose of this change is unrelated >>> to the original 512 MB memory footprint goal. It just happens to touch >>> the same constant but the goal is now to submit at most 16 I/O requests >>> in parallel to avoid monopolizing the I/O device. >> I think we should really look at this. The variables that control if we stay >> in the while loop or not are incremented and decremented >> at the following places: >> >> mig_save_device_dirty: >> mig_save_device_bulk: >> block_mig_state.submitted++; >> >> blk_mig_read_cb: >> block_mig_state.submitted--; >> block_mig_state.read_done++; >> >> flush_blks: >> block_mig_state.read_done--; >> >> The condition of the while loop is: >> (block_mig_state.submitted + >> block_mig_state.read_done) * BLOCK_SIZE < >>qemu_file_get_rate_limit(f) && >>(block_mig_state.submitted + >> block_mig_state.read_done) < >>MAX_INFLIGHT_IO) >> >> At first I wonder if we ever reach the rate-limit because we put the read >> buffers onto f AFTER we exit the while loop? >> >> And even if we reach the limit we constantly maintain 512 I/Os in parallel >> because we immediately decrement read_done >> when we put the buffers to f in flush_blks. In the next iteration of the >> while loop we then read again until we have 512 in-flight I/Os. >> >> And shouldn't we have a time limit to limit the time we stay in the while >> loop? I think we artificially delay sending data to f? > > Thinking about it for a while I would propose the following: > > a) rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS > b) add MAX_PARALLEL_IO with a value of 16 > c) compare qemu_file_get_rate_limit only with block_mig_state.read_done > > This would yield in the following condition for the while loop: > > (block_mig_state.read_done * BLOCK_SIZE < qemu_file_get_rate_limit(f) && > (block_mig_state.submitted + block_mig_state.read_done) < MAX_IO_BUFFERS && > block_mig_state.submitted < MAX_PARALLEL_IO) > > Sounds that like a plan? That sounds good to me. Stefan
Re: [Qemu-block] block migration and MAX_IN_FLIGHT_IO
Am 06.03.2018 um 17:35 schrieb Peter Lieven: > Am 06.03.2018 um 17:07 schrieb Stefan Hajnoczi: >> On Mon, Mar 05, 2018 at 02:52:16PM +, Dr. David Alan Gilbert wrote: >>> * Peter Lieven (p...@kamp.de) wrote: Am 05.03.2018 um 12:45 schrieb Stefan Hajnoczi: > On Thu, Feb 22, 2018 at 12:13:50PM +0100, Peter Lieven wrote: >> I stumbled across the MAX_INFLIGHT_IO field that was introduced in 2015 >> and was curious what was the reason >> to choose 512MB as readahead? The question is that I found that the >> source VM gets very unresponsive I/O wise >> while the initial 512MB are read and furthermore seems to stay >> unreasponsive if we choose a high migration speed >> and have a fast storage on the destination VM. >> >> In our environment I modified this value to 16MB which seems to work >> much smoother. I wonder if we should make >> this a user configurable value or define a different rate limit for the >> block transfer in bulk stage at least? > I don't know if benchmarks were run when choosing the value. From the > commit description it sounds like the main purpose was to limit the > amount of memory that can be consumed. > > 16 MB also fulfills that criteria :), but why is the source VM more > responsive with a lower value? > > Perhaps the issue is queue depth on the storage device - the block > migration code enqueues up to 512 MB worth of reads, and guest I/O has > to wait? That is my guess. Especially if the destination storage is faster we basically alsways have 512 I/Os in flight on the source storage. Does anyone mind if the reduce that value to 16MB or do we need a better mechanism? >>> We've got migration-parameters these days; you could connect it to one >>> of those fairly easily I think. >>> Try: grep -i 'cpu[-_]throttle[-_]initial' for an example of one that's >>> already there. >>> Then you can set it to whatever you like. >> It would be nice to solve the performance problem without adding a >> tuneable. >> >> On the other hand, QEMU has no idea what the queue depth of the device >> is. Therefore it cannot prioritize guest I/O over block migration I/O. >> >> 512 parallel requests is much too high. Most parallel I/O benchmarking >> is done at 32-64 queue depth. >> >> I think that 16 parallel requests is a reasonable maximum number for a >> background job. >> >> We need to be clear though that the purpose of this change is unrelated >> to the original 512 MB memory footprint goal. It just happens to touch >> the same constant but the goal is now to submit at most 16 I/O requests >> in parallel to avoid monopolizing the I/O device. > I think we should really look at this. The variables that control if we stay > in the while loop or not are incremented and decremented > at the following places: > > mig_save_device_dirty: > mig_save_device_bulk: > block_mig_state.submitted++; > > blk_mig_read_cb: > block_mig_state.submitted--; > block_mig_state.read_done++; > > flush_blks: > block_mig_state.read_done--; > > The condition of the while loop is: > (block_mig_state.submitted + > block_mig_state.read_done) * BLOCK_SIZE < >qemu_file_get_rate_limit(f) && >(block_mig_state.submitted + > block_mig_state.read_done) < >MAX_INFLIGHT_IO) > > At first I wonder if we ever reach the rate-limit because we put the read > buffers onto f AFTER we exit the while loop? > > And even if we reach the limit we constantly maintain 512 I/Os in parallel > because we immediately decrement read_done > when we put the buffers to f in flush_blks. In the next iteration of the > while loop we then read again until we have 512 in-flight I/Os. > > And shouldn't we have a time limit to limit the time we stay in the while > loop? I think we artificially delay sending data to f? Thinking about it for a while I would propose the following: a) rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS b) add MAX_PARALLEL_IO with a value of 16 c) compare qemu_file_get_rate_limit only with block_mig_state.read_done This would yield in the following condition for the while loop: (block_mig_state.read_done * BLOCK_SIZE < qemu_file_get_rate_limit(f) && (block_mig_state.submitted + block_mig_state.read_done) < MAX_IO_BUFFERS && block_mig_state.submitted < MAX_PARALLEL_IO) Sounds that like a plan? Peter
Re: [Qemu-block] block migration and MAX_IN_FLIGHT_IO
Am 06.03.2018 um 17:07 schrieb Stefan Hajnoczi: > On Mon, Mar 05, 2018 at 02:52:16PM +, Dr. David Alan Gilbert wrote: >> * Peter Lieven (p...@kamp.de) wrote: >>> Am 05.03.2018 um 12:45 schrieb Stefan Hajnoczi: On Thu, Feb 22, 2018 at 12:13:50PM +0100, Peter Lieven wrote: > I stumbled across the MAX_INFLIGHT_IO field that was introduced in 2015 > and was curious what was the reason > to choose 512MB as readahead? The question is that I found that the > source VM gets very unresponsive I/O wise > while the initial 512MB are read and furthermore seems to stay > unreasponsive if we choose a high migration speed > and have a fast storage on the destination VM. > > In our environment I modified this value to 16MB which seems to work much > smoother. I wonder if we should make > this a user configurable value or define a different rate limit for the > block transfer in bulk stage at least? I don't know if benchmarks were run when choosing the value. From the commit description it sounds like the main purpose was to limit the amount of memory that can be consumed. 16 MB also fulfills that criteria :), but why is the source VM more responsive with a lower value? Perhaps the issue is queue depth on the storage device - the block migration code enqueues up to 512 MB worth of reads, and guest I/O has to wait? >>> That is my guess. Especially if the destination storage is faster we >>> basically alsways have >>> 512 I/Os in flight on the source storage. >>> >>> Does anyone mind if the reduce that value to 16MB or do we need a better >>> mechanism? >> We've got migration-parameters these days; you could connect it to one >> of those fairly easily I think. >> Try: grep -i 'cpu[-_]throttle[-_]initial' for an example of one that's >> already there. >> Then you can set it to whatever you like. > It would be nice to solve the performance problem without adding a > tuneable. > > On the other hand, QEMU has no idea what the queue depth of the device > is. Therefore it cannot prioritize guest I/O over block migration I/O. > > 512 parallel requests is much too high. Most parallel I/O benchmarking > is done at 32-64 queue depth. > > I think that 16 parallel requests is a reasonable maximum number for a > background job. > > We need to be clear though that the purpose of this change is unrelated > to the original 512 MB memory footprint goal. It just happens to touch > the same constant but the goal is now to submit at most 16 I/O requests > in parallel to avoid monopolizing the I/O device. I think we should really look at this. The variables that control if we stay in the while loop or not are incremented and decremented at the following places: mig_save_device_dirty: mig_save_device_bulk: block_mig_state.submitted++; blk_mig_read_cb: block_mig_state.submitted--; block_mig_state.read_done++; flush_blks: block_mig_state.read_done--; The condition of the while loop is: (block_mig_state.submitted + block_mig_state.read_done) * BLOCK_SIZE < qemu_file_get_rate_limit(f) && (block_mig_state.submitted + block_mig_state.read_done) < MAX_INFLIGHT_IO) At first I wonder if we ever reach the rate-limit because we put the read buffers onto f AFTER we exit the while loop? And even if we reach the limit we constantly maintain 512 I/Os in parallel because we immediately decrement read_done when we put the buffers to f in flush_blks. In the next iteration of the while loop we then read again until we have 512 in-flight I/Os. And shouldn't we have a time limit to limit the time we stay in the while loop? I think we artificially delay sending data to f? Peter
Re: [Qemu-block] block migration and MAX_IN_FLIGHT_IO
Am 05.03.2018 um 15:52 schrieb Dr. David Alan Gilbert: > * Peter Lieven (p...@kamp.de) wrote: >> Am 05.03.2018 um 12:45 schrieb Stefan Hajnoczi: >>> On Thu, Feb 22, 2018 at 12:13:50PM +0100, Peter Lieven wrote: I stumbled across the MAX_INFLIGHT_IO field that was introduced in 2015 and was curious what was the reason to choose 512MB as readahead? The question is that I found that the source VM gets very unresponsive I/O wise while the initial 512MB are read and furthermore seems to stay unreasponsive if we choose a high migration speed and have a fast storage on the destination VM. In our environment I modified this value to 16MB which seems to work much smoother. I wonder if we should make this a user configurable value or define a different rate limit for the block transfer in bulk stage at least? >>> I don't know if benchmarks were run when choosing the value. From the >>> commit description it sounds like the main purpose was to limit the >>> amount of memory that can be consumed. >>> >>> 16 MB also fulfills that criteria :), but why is the source VM more >>> responsive with a lower value? >>> >>> Perhaps the issue is queue depth on the storage device - the block >>> migration code enqueues up to 512 MB worth of reads, and guest I/O has >>> to wait? >> That is my guess. Especially if the destination storage is faster we >> basically alsways have >> 512 I/Os in flight on the source storage. >> >> Does anyone mind if the reduce that value to 16MB or do we need a better >> mechanism? > We've got migration-parameters these days; you could connect it to one > of those fairly easily I think. > Try: grep -i 'cpu[-_]throttle[-_]initial' for an example of one that's > already there. > Then you can set it to whatever you like. I will have a look at this. Thank you, Peter > > Dave > >> Peter >> >> > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-block] block migration and MAX_IN_FLIGHT_IO
On Mon, Mar 05, 2018 at 02:52:16PM +, Dr. David Alan Gilbert wrote: > * Peter Lieven (p...@kamp.de) wrote: > > Am 05.03.2018 um 12:45 schrieb Stefan Hajnoczi: > > > On Thu, Feb 22, 2018 at 12:13:50PM +0100, Peter Lieven wrote: > > >> I stumbled across the MAX_INFLIGHT_IO field that was introduced in 2015 > > >> and was curious what was the reason > > >> to choose 512MB as readahead? The question is that I found that the > > >> source VM gets very unresponsive I/O wise > > >> while the initial 512MB are read and furthermore seems to stay > > >> unreasponsive if we choose a high migration speed > > >> and have a fast storage on the destination VM. > > >> > > >> In our environment I modified this value to 16MB which seems to work > > >> much smoother. I wonder if we should make > > >> this a user configurable value or define a different rate limit for the > > >> block transfer in bulk stage at least? > > > I don't know if benchmarks were run when choosing the value. From the > > > commit description it sounds like the main purpose was to limit the > > > amount of memory that can be consumed. > > > > > > 16 MB also fulfills that criteria :), but why is the source VM more > > > responsive with a lower value? > > > > > > Perhaps the issue is queue depth on the storage device - the block > > > migration code enqueues up to 512 MB worth of reads, and guest I/O has > > > to wait? > > > > That is my guess. Especially if the destination storage is faster we > > basically alsways have > > 512 I/Os in flight on the source storage. > > > > Does anyone mind if the reduce that value to 16MB or do we need a better > > mechanism? > > We've got migration-parameters these days; you could connect it to one > of those fairly easily I think. > Try: grep -i 'cpu[-_]throttle[-_]initial' for an example of one that's > already there. > Then you can set it to whatever you like. It would be nice to solve the performance problem without adding a tuneable. On the other hand, QEMU has no idea what the queue depth of the device is. Therefore it cannot prioritize guest I/O over block migration I/O. 512 parallel requests is much too high. Most parallel I/O benchmarking is done at 32-64 queue depth. I think that 16 parallel requests is a reasonable maximum number for a background job. We need to be clear though that the purpose of this change is unrelated to the original 512 MB memory footprint goal. It just happens to touch the same constant but the goal is now to submit at most 16 I/O requests in parallel to avoid monopolizing the I/O device. Stefan signature.asc Description: PGP signature
Re: [Qemu-block] block migration and MAX_IN_FLIGHT_IO
* Peter Lieven (p...@kamp.de) wrote: > Am 05.03.2018 um 12:45 schrieb Stefan Hajnoczi: > > On Thu, Feb 22, 2018 at 12:13:50PM +0100, Peter Lieven wrote: > >> I stumbled across the MAX_INFLIGHT_IO field that was introduced in 2015 > >> and was curious what was the reason > >> to choose 512MB as readahead? The question is that I found that the source > >> VM gets very unresponsive I/O wise > >> while the initial 512MB are read and furthermore seems to stay > >> unreasponsive if we choose a high migration speed > >> and have a fast storage on the destination VM. > >> > >> In our environment I modified this value to 16MB which seems to work much > >> smoother. I wonder if we should make > >> this a user configurable value or define a different rate limit for the > >> block transfer in bulk stage at least? > > I don't know if benchmarks were run when choosing the value. From the > > commit description it sounds like the main purpose was to limit the > > amount of memory that can be consumed. > > > > 16 MB also fulfills that criteria :), but why is the source VM more > > responsive with a lower value? > > > > Perhaps the issue is queue depth on the storage device - the block > > migration code enqueues up to 512 MB worth of reads, and guest I/O has > > to wait? > > That is my guess. Especially if the destination storage is faster we > basically alsways have > 512 I/Os in flight on the source storage. > > Does anyone mind if the reduce that value to 16MB or do we need a better > mechanism? We've got migration-parameters these days; you could connect it to one of those fairly easily I think. Try: grep -i 'cpu[-_]throttle[-_]initial' for an example of one that's already there. Then you can set it to whatever you like. Dave > Peter > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-block] block migration and MAX_IN_FLIGHT_IO
Am 05.03.2018 um 12:45 schrieb Stefan Hajnoczi: > On Thu, Feb 22, 2018 at 12:13:50PM +0100, Peter Lieven wrote: >> I stumbled across the MAX_INFLIGHT_IO field that was introduced in 2015 and >> was curious what was the reason >> to choose 512MB as readahead? The question is that I found that the source >> VM gets very unresponsive I/O wise >> while the initial 512MB are read and furthermore seems to stay unreasponsive >> if we choose a high migration speed >> and have a fast storage on the destination VM. >> >> In our environment I modified this value to 16MB which seems to work much >> smoother. I wonder if we should make >> this a user configurable value or define a different rate limit for the >> block transfer in bulk stage at least? > I don't know if benchmarks were run when choosing the value. From the > commit description it sounds like the main purpose was to limit the > amount of memory that can be consumed. > > 16 MB also fulfills that criteria :), but why is the source VM more > responsive with a lower value? > > Perhaps the issue is queue depth on the storage device - the block > migration code enqueues up to 512 MB worth of reads, and guest I/O has > to wait? That is my guess. Especially if the destination storage is faster we basically alsways have 512 I/Os in flight on the source storage. Does anyone mind if the reduce that value to 16MB or do we need a better mechanism? Peter
Re: [Qemu-block] block migration and MAX_IN_FLIGHT_IO
On Thu, Feb 22, 2018 at 12:13:50PM +0100, Peter Lieven wrote: > I stumbled across the MAX_INFLIGHT_IO field that was introduced in 2015 and > was curious what was the reason > to choose 512MB as readahead? The question is that I found that the source VM > gets very unresponsive I/O wise > while the initial 512MB are read and furthermore seems to stay unreasponsive > if we choose a high migration speed > and have a fast storage on the destination VM. > > In our environment I modified this value to 16MB which seems to work much > smoother. I wonder if we should make > this a user configurable value or define a different rate limit for the block > transfer in bulk stage at least? I don't know if benchmarks were run when choosing the value. From the commit description it sounds like the main purpose was to limit the amount of memory that can be consumed. 16 MB also fulfills that criteria :), but why is the source VM more responsive with a lower value? Perhaps the issue is queue depth on the storage device - the block migration code enqueues up to 512 MB worth of reads, and guest I/O has to wait? Stefan signature.asc Description: PGP signature