On Wed, Apr 18, 2018 at 09:36:33AM +0100, Dr. David Alan Gilbert wrote: > * Balamuruhan S (bal...@linux.vnet.ibm.com) wrote: > > On Wed, Apr 18, 2018 at 10:57:26AM +1000, David Gibson wrote: > > > On Wed, Apr 18, 2018 at 10:55:50AM +1000, David Gibson wrote: > > > > On Tue, Apr 17, 2018 at 06:53:17PM +0530, Balamuruhan S wrote: > > > > > expected_downtime value is not accurate with dirty_pages_rate * > > > > > page_size, > > > > > using ram_bytes_remaining would yeild it correct. > > > > > > > > This commit message hasn't been changed since v1, but the patch is > > > > doing something completely different. I think most of the info from > > > > your cover letter needs to be in here. > > > > > > > > > > > > > > Signed-off-by: Balamuruhan S <bal...@linux.vnet.ibm.com> > > > > > --- > > > > > migration/migration.c | 6 +++--- > > > > > migration/migration.h | 1 + > > > > > 2 files changed, 4 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > > > > index 52a5092add..4d866bb920 100644 > > > > > --- a/migration/migration.c > > > > > +++ b/migration/migration.c > > > > > @@ -614,7 +614,7 @@ static void populate_ram_info(MigrationInfo > > > > > *info, MigrationState *s) > > > > > } > > > > > > > > > > if (s->state != MIGRATION_STATUS_COMPLETED) { > > > > > - info->ram->remaining = ram_bytes_remaining(); > > > > > + info->ram->remaining = s->ram_bytes_remaining; > > > > > info->ram->dirty_pages_rate = ram_counters.dirty_pages_rate; > > > > > } > > > > > } > > > > > @@ -2227,6 +2227,7 @@ static void > > > > > migration_update_counters(MigrationState *s, > > > > > transferred = qemu_ftell(s->to_dst_file) - > > > > > s->iteration_initial_bytes; > > > > > time_spent = current_time - s->iteration_start_time; > > > > > bandwidth = (double)transferred / time_spent; > > > > > + s->ram_bytes_remaining = ram_bytes_remaining(); > > > > > s->threshold_size = bandwidth * s->parameters.downtime_limit; > > > > > > > > > > s->mbps = (((double) transferred * 8.0) / > > > > > @@ -2237,8 +2238,7 @@ static void > > > > > migration_update_counters(MigrationState *s, > > > > > * recalculate. 10000 is a small enough number for our purposes > > > > > */ > > > > > if (ram_counters.dirty_pages_rate && transferred > 10000) { > > > > > - s->expected_downtime = ram_counters.dirty_pages_rate * > > > > > - qemu_target_page_size() / bandwidth; > > > > > + s->expected_downtime = s->ram_bytes_remaining / bandwidth; > > > > > } > > > > > > ..but more importantly, I still think this change is bogus. expected > > > downtime is not the same thing as remaining ram / bandwidth. > > > > I tested precopy migration of 16M HP backed P8 guest from P8 to 1G P9 host > > and observed precopy migration was infinite with expected_downtime set as > > downtime-limit. > > Did you debug why it was infinite? Which component of the calculation > had gone wrong and why? > > > During the discussion for Bug RH1560562, Michael Roth quoted that > > > > One thing to note: in my testing I found that the "expected downtime" value > > seems inaccurate in this scenario. To find a max downtime that allowed > > migration to complete I had to divide "remaining ram" by "throughput" from > > "info migrate" (after the initial pre-copy pass through ram, i.e. once > > "dirty pages" value starts getting reported and we're just sending dirtied > > pages). > > > > Later by trying it precopy migration could able to complete with this > > approach. > > > > adding Michael Roth in cc. > > We should try and _understand_ the rational for the change, not just go > with it. Now, remember that whatever we do is just an estimate and
I have made the change based on my understanding, Currently the calculation is, expected_downtime = (dirty_pages_rate * qemu_target_page_size) / bandwidth dirty_pages_rate = No of dirty pages / time => its unit (1 / seconds) qemu_target_page_size => its unit (bytes) dirty_pages_rate * qemu_target_page_size => bytes/seconds bandwidth = bytes transferred / time => bytes/seconds dividing this would not be a measurement of time. > there will be lots of cases where it's bad - so be careful what you're > using it for - you definitely should NOT use the value in any automated > system. I agree with it and I would not use it in automated system. > My problem with just using ram_bytes_remaining is that it doesn't take > into account the rate at which the guest is changing RAM - which feels > like it's the important measure for expected downtime. ram_bytes_remaining = ram_state->migration_dirty_pages * TARGET_PAGE_SIZE This means ram_bytes_remaining is proportional to guest changing RAM, so we can consider this change would yield expected_downtime Regards, Bala > > Dave > > > Regards, > > Bala > > > > > > > > > > > > > > > qemu_file_reset_rate_limit(s->to_dst_file); > > > > > diff --git a/migration/migration.h b/migration/migration.h > > > > > index 8d2f320c48..8584f8e22e 100644 > > > > > --- a/migration/migration.h > > > > > +++ b/migration/migration.h > > > > > @@ -128,6 +128,7 @@ struct MigrationState > > > > > int64_t downtime_start; > > > > > int64_t downtime; > > > > > int64_t expected_downtime; > > > > > + int64_t ram_bytes_remaining; > > > > > bool enabled_capabilities[MIGRATION_CAPABILITY__MAX]; > > > > > int64_t setup_time; > > > > > /* > > > > > > > > > > > > > > > > -- > > > David Gibson | I'll have my music baroque, and my > > > code > > > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ > > > _other_ > > > | _way_ _around_! > > > http://www.ozlabs.org/~dgibson > > > > > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK >