> Am 07.02.2018 um 19:29 schrieb Dr. David Alan Gilbert <[email protected]>: > > * Peter Lieven ([email protected]) wrote: >> Am 12.12.2017 um 18:05 schrieb Dr. David Alan Gilbert: >>> * Peter Lieven ([email protected]) wrote: >>>> Am 21.09.2017 um 14:36 schrieb Dr. David Alan Gilbert: >>>>> * Peter Lieven ([email protected]) wrote: >>>>>> Am 19.09.2017 um 16:41 schrieb Dr. David Alan Gilbert: >>>>>>> * Peter Lieven ([email protected]) wrote: >>>>>>>> Am 19.09.2017 um 16:38 schrieb Dr. David Alan Gilbert: >>>>>>>>> * Peter Lieven ([email protected]) wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I just noticed that CPU throttling and Block Migration don't work >>>>>>>>>> together very well. >>>>>>>>>> During block migration the throttling heuristic detects that we >>>>>>>>>> obviously make no progress >>>>>>>>>> in ram transfer. But the reason is the running block migration and >>>>>>>>>> not a too high dirty pages rate. >>>>>>>>>> >>>>>>>>>> The result is that any VM is throttled by 99% during block migration. >>>>>>>>> Hmm that's unfortunate; do you have a bandwidth set lower than your >>>>>>>>> actual network connection? I'm just wondering if it's actually going >>>>>>>>> between the block and RAM iterative sections or getting stuck in ne. >>>>>>>> It happens also if source and dest are on the same machine and speed >>>>>>>> is set to 100G. >>>>>>> But does it happen if they're not and the speed is set low? >>>>>> Yes, it does. I noticed it in our test environment between different >>>>>> nodes with a 10G >>>>>> link in between. But its totally clear why it happens. During block >>>>>> migration we transfer >>>>>> all dirty memory pages in each round (if there is moderate memory load), >>>>>> but all dirty >>>>>> pages are obviously more than 50% of the transferred ram in that round. >>>>>> It is more exactly 100%. But the current logic triggers on this >>>>>> condition. >>>>>> >>>>>> I think I will go forward and send a patch which disables auto converge >>>>>> during >>>>>> block migration bulk stage. >>>>> Yes, that's fair; it probably would also make sense to throttle the RAM >>>>> migration during the block migration bulk stage, since the chances are >>>>> it's not going to get far. (I think in the nbd setup, the main >>>>> migration process isn't started until the end of bulk). >>>> Catching up with the idea of delaying ram migration until block bulk has >>>> completed. >>>> What do you think is the easiest way to achieve this? >>> <excavates inbox, and notices I never replied> >>> >>> I think the answer depends whether we think this is a 'special' or we >>> need a new general purpose mechanism. >>> >>> If it was really general then we'd probably want to split the iterative >>> stage in two somehow, and only do RAM in the second half. >>> >>> But I'm not sure it's worth it; I suspect the easiest way is: >>> >>> a) Add a counter in migration/ram.c or in the RAM state somewhere >>> b) Make ram_save_inhibit increment the counter >>> c) Check the counter at the head of ram_save_iterate and just exit >>> if it's none 0 >>> d) Call ram_save_inhibit from block_save_setup >>> e) Then release it when you've finished the bulk stage >>> >>> Make sure you still count the RAM in the pending totals, otherwise >>> migration might think it's finished a bit early. >> >> Is there any culprit I don't see or is it as easy as this? > > Hmm, looks promising doesn't it; might need an include or two tidied > up, but looks worth a try. Just be careful that there are no cases > where block migration can't transfer data in that state, otherwise we'll > keep coming back to here and spewing empty sections.
I already tested it and it actually works. What would you expect to be cleaned up before it would be a proper patch? Are there any implications with RDMA and/or post copy migration? Is block migration possible at all with those? Peter
