Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-12-04 Thread David Gibson
On Wed, Dec 04, 2019 at 04:49:15PM +, Dr. David Alan Gilbert wrote:
> * Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> > On Mon, Oct 21, 2019 at 09:14:44AM +0100, Dr. David Alan Gilbert wrote:
> > > * David Gibson (da...@gibson.dropbear.id.au) wrote:
> > > > On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Laurent Vivier (lviv...@redhat.com) wrote:
> > > > > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > > > > * Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> > > > > > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  
> > > > > > >> Priority
> > > > > > >> sorting is maintained by searching from head to tail for a 
> > > > > > >> suitable
> > > > > > >> insertion spot.  Insertion is thus an O(n) operation.
> > > > > > >>
> > > > > > >> If we instead keep track of the head of each priority's subqueue
> > > > > > >> within that larger queue we can reduce this operation to O(1) 
> > > > > > >> time.
> > > > > > >>
> > > > > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > > > > >> accomodate these gains: we need to replace the head of a 
> > > > > > >> priority's
> > > > > > >> subqueue when removing it.
> > > > > > >>
> > > > > > >> With O(1) insertion, booting VMs with many SaveStateEntry 
> > > > > > >> objects is
> > > > > > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 
> > > > > > >> 4 such
> > > > > > >> objects to insert.
> > > > > > > 
> > > > > > > Separate from reviewing this patch, I'd like to understand why 
> > > > > > > you've
> > > > > > > got 4 objects.  This feels very very wrong and is likely to 
> > > > > > > cause
> > > > > > > problems to random other bits of qemu as well.
> > > > > > 
> > > > > > I think the 4 objects are the "dr-connectors" that are used to 
> > > > > > plug
> > > > > > peripherals (memory, pci card, cpus, ...).
> > > > > 
> > > > > Yes, Scott confirmed that in the reply to the previous version.
> > > > > IMHO nothing in qemu is designed to deal with that many 
> > > > > devices/objects
> > > > > - I'm sure that something other than the migration code is going to
> > > > > get upset.
> > > > 
> > > > It kind of did.  Particularly when there was n^2 and n^3 cubed
> > > > behaviour in the property stuff we had some ludicrously long startup
> > > > times (hours) with large maxmem values.
> > > > 
> > > > Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem.
> > > > The problem is the memory DRCs, there's one for each LMB - each 256MiB
> > > > chunk of memory (or possible memory).
> > > > 
> > > > > Is perhaps the structure wrong somewhere - should there be a single 
> > > > > DRC
> > > > > device that knows about all DRCs?
> > > > 
> > > > Maybe.  The tricky bit is how to get there from here without breaking
> > > > migration or something else along the way.
> > > 
> > > Switch on the next machine type version - it doesn't matter if migration
> > > is incompatible then.
> > 
> > 1mo bump.
> > 
> > Is there anything I need to do with this patch in particular to make it 
> > suitable
> > for merging?
> 
> Apologies for the delay;  hopefully this will go in one of the pulls
> just after the tree opens again.
> 
> Please please try and work on reducing the number of objects somehow -
> while this migration fix is a useful short term fix, and not too
> invasive; having that many objects around qemu is a really really bad
> idea so needs fixing properly.

I'm hoping to have a crack at this tomorrow.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-12-04 Thread Dr. David Alan Gilbert
* Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> On Mon, Oct 21, 2019 at 09:14:44AM +0100, Dr. David Alan Gilbert wrote:
> > * David Gibson (da...@gibson.dropbear.id.au) wrote:
> > > On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Laurent Vivier (lviv...@redhat.com) wrote:
> > > > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > > > * Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> > > > > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > > > > >> sorting is maintained by searching from head to tail for a suitable
> > > > > >> insertion spot.  Insertion is thus an O(n) operation.
> > > > > >>
> > > > > >> If we instead keep track of the head of each priority's subqueue
> > > > > >> within that larger queue we can reduce this operation to O(1) time.
> > > > > >>
> > > > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > > > >> accomodate these gains: we need to replace the head of a priority's
> > > > > >> subqueue when removing it.
> > > > > >>
> > > > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects 
> > > > > >> is
> > > > > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 4 
> > > > > >> such
> > > > > >> objects to insert.
> > > > > > 
> > > > > > Separate from reviewing this patch, I'd like to understand why 
> > > > > > you've
> > > > > > got 4 objects.  This feels very very wrong and is likely to 
> > > > > > cause
> > > > > > problems to random other bits of qemu as well.
> > > > > 
> > > > > I think the 4 objects are the "dr-connectors" that are used to 
> > > > > plug
> > > > > peripherals (memory, pci card, cpus, ...).
> > > > 
> > > > Yes, Scott confirmed that in the reply to the previous version.
> > > > IMHO nothing in qemu is designed to deal with that many devices/objects
> > > > - I'm sure that something other than the migration code is going to
> > > > get upset.
> > > 
> > > It kind of did.  Particularly when there was n^2 and n^3 cubed
> > > behaviour in the property stuff we had some ludicrously long startup
> > > times (hours) with large maxmem values.
> > > 
> > > Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem.
> > > The problem is the memory DRCs, there's one for each LMB - each 256MiB
> > > chunk of memory (or possible memory).
> > > 
> > > > Is perhaps the structure wrong somewhere - should there be a single DRC
> > > > device that knows about all DRCs?
> > > 
> > > Maybe.  The tricky bit is how to get there from here without breaking
> > > migration or something else along the way.
> > 
> > Switch on the next machine type version - it doesn't matter if migration
> > is incompatible then.
> 
> 1mo bump.
> 
> Is there anything I need to do with this patch in particular to make it 
> suitable
> for merging?

Apologies for the delay;  hopefully this will go in one of the pulls
just after the tree opens again.

Please please try and work on reducing the number of objects somehow -
while this migration fix is a useful short term fix, and not too
invasive; having that many objects around qemu is a really really bad
idea so needs fixing properly.

Dave

> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-12-04 Thread Dr. David Alan Gilbert
* Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> sorting is maintained by searching from head to tail for a suitable
> insertion spot.  Insertion is thus an O(n) operation.
> 
> If we instead keep track of the head of each priority's subqueue
> within that larger queue we can reduce this operation to O(1) time.
> 
> savevm_state_handler_remove() becomes slightly more complex to
> accomodate these gains: we need to replace the head of a priority's
> subqueue when removing it.
> 
> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> more plausible.  For example, a ppc64 VM with maxmem=8T has 4 such
> objects to insert.
> 
> Signed-off-by: Scott Cheloha 

OK, it took me a while to figure out why you didn't just
turn handlers into handlers[MIG_PRI_MAX]; but I guess the problem is
you would have to change all the foreach's scattered around that walk
the list.  So


Reviewed-by: Dr. David Alan Gilbert 

> ---
>  migration/savevm.c | 26 +++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index b2e3b7222a..f7a2d36bba 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -250,6 +250,7 @@ typedef struct SaveStateEntry {
>  
>  typedef struct SaveState {
>  QTAILQ_HEAD(, SaveStateEntry) handlers;
> +SaveStateEntry *handler_pri_head[MIG_PRI_MAX + 1];
>  int global_section_id;
>  uint32_t len;
>  const char *name;
> @@ -261,6 +262,7 @@ typedef struct SaveState {
>  
>  static SaveState savevm_state = {
>  .handlers = QTAILQ_HEAD_INITIALIZER(savevm_state.handlers),
> +.handler_pri_head = { [MIG_PRI_DEFAULT ... MIG_PRI_MAX] = NULL },
>  .global_section_id = 0,
>  };
>  
> @@ -709,24 +711,42 @@ static void savevm_state_handler_insert(SaveStateEntry 
> *nse)
>  {
>  MigrationPriority priority = save_state_priority(nse);
>  SaveStateEntry *se;
> +int i;
>  
>  assert(priority <= MIG_PRI_MAX);
>  
> -QTAILQ_FOREACH(se, _state.handlers, entry) {
> -if (save_state_priority(se) < priority) {
> +for (i = priority - 1; i >= 0; i--) {
> +se = savevm_state.handler_pri_head[i];
> +if (se != NULL) {
> +assert(save_state_priority(se) < priority);
>  break;
>  }
>  }
>  
> -if (se) {
> +if (i >= 0) {
>  QTAILQ_INSERT_BEFORE(se, nse, entry);
>  } else {
>  QTAILQ_INSERT_TAIL(_state.handlers, nse, entry);
>  }
> +
> +if (savevm_state.handler_pri_head[priority] == NULL) {
> +savevm_state.handler_pri_head[priority] = nse;
> +}
>  }
>  
>  static void savevm_state_handler_remove(SaveStateEntry *se)
>  {
> +SaveStateEntry *next;
> +MigrationPriority priority = save_state_priority(se);
> +
> +if (se == savevm_state.handler_pri_head[priority]) {
> +next = QTAILQ_NEXT(se, entry);
> +if (next != NULL && save_state_priority(next) == priority) {
> +savevm_state.handler_pri_head[priority] = next;
> +} else {
> +savevm_state.handler_pri_head[priority] = NULL;
> +}
> +}
>  QTAILQ_REMOVE(_state.handlers, se, entry);
>  }
>  
> -- 
> 2.23.0
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-11-20 Thread Scott Cheloha
On Mon, Oct 21, 2019 at 09:14:44AM +0100, Dr. David Alan Gilbert wrote:
> * David Gibson (da...@gibson.dropbear.id.au) wrote:
> > On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote:
> > > * Laurent Vivier (lviv...@redhat.com) wrote:
> > > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > > * Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> > > > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > > > >> sorting is maintained by searching from head to tail for a suitable
> > > > >> insertion spot.  Insertion is thus an O(n) operation.
> > > > >>
> > > > >> If we instead keep track of the head of each priority's subqueue
> > > > >> within that larger queue we can reduce this operation to O(1) time.
> > > > >>
> > > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > > >> accomodate these gains: we need to replace the head of a priority's
> > > > >> subqueue when removing it.
> > > > >>
> > > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > > > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 4 
> > > > >> such
> > > > >> objects to insert.
> > > > > 
> > > > > Separate from reviewing this patch, I'd like to understand why you've
> > > > > got 4 objects.  This feels very very wrong and is likely to cause
> > > > > problems to random other bits of qemu as well.
> > > > 
> > > > I think the 4 objects are the "dr-connectors" that are used to plug
> > > > peripherals (memory, pci card, cpus, ...).
> > > 
> > > Yes, Scott confirmed that in the reply to the previous version.
> > > IMHO nothing in qemu is designed to deal with that many devices/objects
> > > - I'm sure that something other than the migration code is going to
> > > get upset.
> > 
> > It kind of did.  Particularly when there was n^2 and n^3 cubed
> > behaviour in the property stuff we had some ludicrously long startup
> > times (hours) with large maxmem values.
> > 
> > Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem.
> > The problem is the memory DRCs, there's one for each LMB - each 256MiB
> > chunk of memory (or possible memory).
> > 
> > > Is perhaps the structure wrong somewhere - should there be a single DRC
> > > device that knows about all DRCs?
> > 
> > Maybe.  The tricky bit is how to get there from here without breaking
> > migration or something else along the way.
> 
> Switch on the next machine type version - it doesn't matter if migration
> is incompatible then.

1mo bump.

Is there anything I need to do with this patch in particular to make it suitable
for merging?



Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-10-21 Thread Dr. David Alan Gilbert
* David Gibson (da...@gibson.dropbear.id.au) wrote:
> On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote:
> > * Laurent Vivier (lviv...@redhat.com) wrote:
> > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > * Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> > > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > > >> sorting is maintained by searching from head to tail for a suitable
> > > >> insertion spot.  Insertion is thus an O(n) operation.
> > > >>
> > > >> If we instead keep track of the head of each priority's subqueue
> > > >> within that larger queue we can reduce this operation to O(1) time.
> > > >>
> > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > >> accomodate these gains: we need to replace the head of a priority's
> > > >> subqueue when removing it.
> > > >>
> > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 4 such
> > > >> objects to insert.
> > > > 
> > > > Separate from reviewing this patch, I'd like to understand why you've
> > > > got 4 objects.  This feels very very wrong and is likely to cause
> > > > problems to random other bits of qemu as well.
> > > 
> > > I think the 4 objects are the "dr-connectors" that are used to plug
> > > peripherals (memory, pci card, cpus, ...).
> > 
> > Yes, Scott confirmed that in the reply to the previous version.
> > IMHO nothing in qemu is designed to deal with that many devices/objects
> > - I'm sure that something other than the migration code is going to
> > get upset.
> 
> It kind of did.  Particularly when there was n^2 and n^3 cubed
> behaviour in the property stuff we had some ludicrously long startup
> times (hours) with large maxmem values.
> 
> Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem.
> The problem is the memory DRCs, there's one for each LMB - each 256MiB
> chunk of memory (or possible memory).
> 
> > Is perhaps the structure wrong somewhere - should there be a single DRC
> > device that knows about all DRCs?
> 
> Maybe.  The tricky bit is how to get there from here without breaking
> migration or something else along the way.

Switch on the next machine type version - it doesn't matter if migration
is incompatible then.

Without knowing anything about the innards of DRCs, I suggest a 
DRCMulti that takes a parameter and represents 'n' DRCs at consecutive
chunks of memory.  Then use one DRCMulti for each RAMBlock or DIMM or
other convenient sized thing.

Dave

> -- 
> David Gibson  | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au| minimalist, thank you.  NOT _the_ 
> _other_
>   | _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-10-21 Thread David Gibson
On Fri, Oct 18, 2019 at 11:38:37AM -0500, Michael Roth wrote:
> Quoting Dr. David Alan Gilbert (2019-10-18 04:43:52)
> > * Laurent Vivier (lviv...@redhat.com) wrote:
> > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > * Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> > > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > > >> sorting is maintained by searching from head to tail for a suitable
> > > >> insertion spot.  Insertion is thus an O(n) operation.
> > > >>
> > > >> If we instead keep track of the head of each priority's subqueue
> > > >> within that larger queue we can reduce this operation to O(1) time.
> > > >>
> > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > >> accomodate these gains: we need to replace the head of a priority's
> > > >> subqueue when removing it.
> > > >>
> > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 4 such
> > > >> objects to insert.
> > > > 
> > > > Separate from reviewing this patch, I'd like to understand why you've
> > > > got 4 objects.  This feels very very wrong and is likely to cause
> > > > problems to random other bits of qemu as well.
> > > 
> > > I think the 4 objects are the "dr-connectors" that are used to plug
> > > peripherals (memory, pci card, cpus, ...).
> > 
> > Yes, Scott confirmed that in the reply to the previous version.
> > IMHO nothing in qemu is designed to deal with that many devices/objects
> > - I'm sure that something other than the migration code is going to get 
> > upset.
> 
> The device/object management aspect seems to handle things *mostly* okay, at
> least ever since QOM child properties started being tracked by a hash table
> instead of a linked list. It's worth noting that that change (b604a854) was
> done to better handle IRQ pins for ARM guests with lots of CPUs. I think it is
> inevitable that certain machine types/configurations will call for large
> numbers of objects and I think it is fair to improve things to allow for this
> sort of scalability.
> 
> But I agree it shouldn't be abused, and you're right that there are some
> problem areas that arise. Trying to outline them:
> 
>  a) introspection commands like 'info qom-tree' become pretty unwieldly,
> and with large enough numbers of objects might even break things (QMP
> response size limits maybe?)
>  b) various related lists like reset handlers, vmstate/savevm handlers might
> grow quite large
> 
> I think we could work around a) with maybe flagging certain
> "internally-only" objects as 'hidden'. Introspection routines could then
> filter these out, and routines like qom-set/qom-get could return report
> something similar to EACCESS so they are never used/useful to management
> tools.
> 
> In cases like b) we can optimize things where it makes sense like with
> Scott's patch here. In most cases these lists need to be walked one way
> or another, whether it's done internally by the object or through common
> interfaces provided by QEMU. It's really just the O(n^2) type handling
> where relying on common interfaces becomes drastically less efficient,
> but I think we should avoid implementing things in that way anyway, or
> improve them as needed.
> 
> > 
> > Is perhaps the structure wrong somewhere - should there be a single DRC
> > device that knows about all DRCs?
> 
> That's an interesting proposition, I think it's worth exploring further,
> but from a high level:
> 
>  - each SpaprDrc has migration state, and some sub-classes SpaprDrc (e.g.
>SpaprDrcPhysical) have additional migration state. These are sent
>as-needed as separate VMState entries in the migration stream.
>Moving to a single DRC means we're either sending them as an flat
>array or a sparse list, which would put just as much load on the
>migration code (at least, with Scott's changes in place). It would
>also be difficult to do all this in a way which maintains migration
>compatibility with older machine types.
>  - other aspects of modeling these as QOM objects, such as look-ups,
>reset-handling, and memory allocations, wouldn't be dramatically
>improved upon by handling it all internally within the object
> 
> AFAICT the biggest issue with modeling the DRCs as individual objects
> is actually how we deal with introspection, and we should try to
> improve. What do you think of the alternative suggestion above of
> marking certain objects as 'hidden' from various introspection
> interfaces?

So, that's not something I'd considered particularly in this context,
but it has bothered me in other contexts.  The fact that all the QOM
interfaces are freely user-inspectable, but are also used for a bunch
of interaction between qemu components means that we (arguably)
routinely make a bunch of stuff into user-visible API which we
probably don't really need or want to.

-- 
David Gibson  

Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-10-19 Thread David Gibson
On Fri, Oct 18, 2019 at 10:43:52AM +0100, Dr. David Alan Gilbert wrote:
> * Laurent Vivier (lviv...@redhat.com) wrote:
> > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > * Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > >> sorting is maintained by searching from head to tail for a suitable
> > >> insertion spot.  Insertion is thus an O(n) operation.
> > >>
> > >> If we instead keep track of the head of each priority's subqueue
> > >> within that larger queue we can reduce this operation to O(1) time.
> > >>
> > >> savevm_state_handler_remove() becomes slightly more complex to
> > >> accomodate these gains: we need to replace the head of a priority's
> > >> subqueue when removing it.
> > >>
> > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 4 such
> > >> objects to insert.
> > > 
> > > Separate from reviewing this patch, I'd like to understand why you've
> > > got 4 objects.  This feels very very wrong and is likely to cause
> > > problems to random other bits of qemu as well.
> > 
> > I think the 4 objects are the "dr-connectors" that are used to plug
> > peripherals (memory, pci card, cpus, ...).
> 
> Yes, Scott confirmed that in the reply to the previous version.
> IMHO nothing in qemu is designed to deal with that many devices/objects
> - I'm sure that something other than the migration code is going to
> get upset.

It kind of did.  Particularly when there was n^2 and n^3 cubed
behaviour in the property stuff we had some ludicrously long startup
times (hours) with large maxmem values.

Fwiw, the DRCs for PCI slots, DRCs and PHBs aren't really a problem.
The problem is the memory DRCs, there's one for each LMB - each 256MiB
chunk of memory (or possible memory).

> Is perhaps the structure wrong somewhere - should there be a single DRC
> device that knows about all DRCs?

Maybe.  The tricky bit is how to get there from here without breaking
migration or something else along the way.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-10-18 Thread Dr. David Alan Gilbert
* Michael Roth (mdr...@linux.vnet.ibm.com) wrote:
> Quoting Dr. David Alan Gilbert (2019-10-18 04:43:52)
> > * Laurent Vivier (lviv...@redhat.com) wrote:
> > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > * Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> > > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > > >> sorting is maintained by searching from head to tail for a suitable
> > > >> insertion spot.  Insertion is thus an O(n) operation.
> > > >>
> > > >> If we instead keep track of the head of each priority's subqueue
> > > >> within that larger queue we can reduce this operation to O(1) time.
> > > >>
> > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > >> accomodate these gains: we need to replace the head of a priority's
> > > >> subqueue when removing it.
> > > >>
> > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 4 such
> > > >> objects to insert.
> > > > 
> > > > Separate from reviewing this patch, I'd like to understand why you've
> > > > got 4 objects.  This feels very very wrong and is likely to cause
> > > > problems to random other bits of qemu as well.
> > > 
> > > I think the 4 objects are the "dr-connectors" that are used to plug
> > > peripherals (memory, pci card, cpus, ...).
> > 
> > Yes, Scott confirmed that in the reply to the previous version.
> > IMHO nothing in qemu is designed to deal with that many devices/objects
> > - I'm sure that something other than the migration code is going to get 
> > upset.
> 
> The device/object management aspect seems to handle things *mostly* okay, at
> least ever since QOM child properties started being tracked by a hash table
> instead of a linked list. It's worth noting that that change (b604a854) was
> done to better handle IRQ pins for ARM guests with lots of CPUs. I think it is
> inevitable that certain machine types/configurations will call for large
> numbers of objects and I think it is fair to improve things to allow for this
> sort of scalability.
> 
> But I agree it shouldn't be abused, and you're right that there are some
> problem areas that arise. Trying to outline them:
> 
>  a) introspection commands like 'info qom-tree' become pretty unwieldly,
> and with large enough numbers of objects might even break things (QMP
> response size limits maybe?)
>  b) various related lists like reset handlers, vmstate/savevm handlers might
> grow quite large
> 
> I think we could work around a) with maybe flagging certain
> "internally-only" objects as 'hidden'. Introspection routines could then
> filter these out, and routines like qom-set/qom-get could return report
> something similar to EACCESS so they are never used/useful to management
> tools.
> 
> In cases like b) we can optimize things where it makes sense like with
> Scott's patch here. In most cases these lists need to be walked one way
> or another, whether it's done internally by the object or through common
> interfaces provided by QEMU. It's really just the O(n^2) type handling
> where relying on common interfaces becomes drastically less efficient,
> but I think we should avoid implementing things in that way anyway, or
> improve them as needed.
> 
> > 
> > Is perhaps the structure wrong somewhere - should there be a single DRC
> > device that knows about all DRCs?
> 
> That's an interesting proposition, I think it's worth exploring further,
> but from a high level:
> 
>  - each SpaprDrc has migration state, and some sub-classes SpaprDrc (e.g.
>SpaprDrcPhysical) have additional migration state. These are sent
>as-needed as separate VMState entries in the migration stream.
>Moving to a single DRC means we're either sending them as an flat
>array or a sparse list, which would put just as much load on the
>migration code (at least, with Scott's changes in place). It would
>also be difficult to do all this in a way which maintains migration
>compatibility with older machine types.

Having sparse arrays etc within a vmstate isn't as bad; none of
them actually need to be 'objects' as such - even if you have
separate chunks of VMState.

>  - other aspects of modeling these as QOM objects, such as look-ups,
>reset-handling, and memory allocations, wouldn't be dramatically
>improved upon by handling it all internally within the object
> 
> AFAICT the biggest issue with modeling the DRCs as individual objects
> is actually how we deal with introspection, and we should try to
> improve. What do you think of the alternative suggestion above of
> marking certain objects as 'hidden' from various introspection
> interfaces?

That's one for someone who knows/cares about QOM more than me;
Paolo, Dan Berrange, or Eduardo Habkost are QOM people.

Dave

> > 
> > Dave
> > 
> > 
> > > https://github.com/qemu/qemu/blob/master/hw/ppc/spapr_drc.c
> > > 
> > > They are part of SPAPR 

Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-10-18 Thread Michael Roth
Quoting Dr. David Alan Gilbert (2019-10-18 04:43:52)
> * Laurent Vivier (lviv...@redhat.com) wrote:
> > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > * Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> > >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> > >> sorting is maintained by searching from head to tail for a suitable
> > >> insertion spot.  Insertion is thus an O(n) operation.
> > >>
> > >> If we instead keep track of the head of each priority's subqueue
> > >> within that larger queue we can reduce this operation to O(1) time.
> > >>
> > >> savevm_state_handler_remove() becomes slightly more complex to
> > >> accomodate these gains: we need to replace the head of a priority's
> > >> subqueue when removing it.
> > >>
> > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > >> more plausible.  For example, a ppc64 VM with maxmem=8T has 4 such
> > >> objects to insert.
> > > 
> > > Separate from reviewing this patch, I'd like to understand why you've
> > > got 4 objects.  This feels very very wrong and is likely to cause
> > > problems to random other bits of qemu as well.
> > 
> > I think the 4 objects are the "dr-connectors" that are used to plug
> > peripherals (memory, pci card, cpus, ...).
> 
> Yes, Scott confirmed that in the reply to the previous version.
> IMHO nothing in qemu is designed to deal with that many devices/objects
> - I'm sure that something other than the migration code is going to get upset.

The device/object management aspect seems to handle things *mostly* okay, at
least ever since QOM child properties started being tracked by a hash table
instead of a linked list. It's worth noting that that change (b604a854) was
done to better handle IRQ pins for ARM guests with lots of CPUs. I think it is
inevitable that certain machine types/configurations will call for large
numbers of objects and I think it is fair to improve things to allow for this
sort of scalability.

But I agree it shouldn't be abused, and you're right that there are some
problem areas that arise. Trying to outline them:

 a) introspection commands like 'info qom-tree' become pretty unwieldly,
and with large enough numbers of objects might even break things (QMP
response size limits maybe?)
 b) various related lists like reset handlers, vmstate/savevm handlers might
grow quite large

I think we could work around a) with maybe flagging certain
"internally-only" objects as 'hidden'. Introspection routines could then
filter these out, and routines like qom-set/qom-get could return report
something similar to EACCESS so they are never used/useful to management
tools.

In cases like b) we can optimize things where it makes sense like with
Scott's patch here. In most cases these lists need to be walked one way
or another, whether it's done internally by the object or through common
interfaces provided by QEMU. It's really just the O(n^2) type handling
where relying on common interfaces becomes drastically less efficient,
but I think we should avoid implementing things in that way anyway, or
improve them as needed.

> 
> Is perhaps the structure wrong somewhere - should there be a single DRC
> device that knows about all DRCs?

That's an interesting proposition, I think it's worth exploring further,
but from a high level:

 - each SpaprDrc has migration state, and some sub-classes SpaprDrc (e.g.
   SpaprDrcPhysical) have additional migration state. These are sent
   as-needed as separate VMState entries in the migration stream.
   Moving to a single DRC means we're either sending them as an flat
   array or a sparse list, which would put just as much load on the
   migration code (at least, with Scott's changes in place). It would
   also be difficult to do all this in a way which maintains migration
   compatibility with older machine types.
 - other aspects of modeling these as QOM objects, such as look-ups,
   reset-handling, and memory allocations, wouldn't be dramatically
   improved upon by handling it all internally within the object

AFAICT the biggest issue with modeling the DRCs as individual objects
is actually how we deal with introspection, and we should try to
improve. What do you think of the alternative suggestion above of
marking certain objects as 'hidden' from various introspection
interfaces?

> 
> Dave
> 
> 
> > https://github.com/qemu/qemu/blob/master/hw/ppc/spapr_drc.c
> > 
> > They are part of SPAPR specification.
> > 
> > https://raw.githubusercontent.com/qemu/qemu/master/docs/specs/ppc-spapr-hotplug.txt
> > 
> > CC Michael Roth
> > 
> > Thanks,
> > Laurent
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 



Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-10-18 Thread Dr. David Alan Gilbert
* Laurent Vivier (lviv...@redhat.com) wrote:
> On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > * Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> >> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> >> sorting is maintained by searching from head to tail for a suitable
> >> insertion spot.  Insertion is thus an O(n) operation.
> >>
> >> If we instead keep track of the head of each priority's subqueue
> >> within that larger queue we can reduce this operation to O(1) time.
> >>
> >> savevm_state_handler_remove() becomes slightly more complex to
> >> accomodate these gains: we need to replace the head of a priority's
> >> subqueue when removing it.
> >>
> >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> >> more plausible.  For example, a ppc64 VM with maxmem=8T has 4 such
> >> objects to insert.
> > 
> > Separate from reviewing this patch, I'd like to understand why you've
> > got 4 objects.  This feels very very wrong and is likely to cause
> > problems to random other bits of qemu as well.
> 
> I think the 4 objects are the "dr-connectors" that are used to plug
> peripherals (memory, pci card, cpus, ...).

Yes, Scott confirmed that in the reply to the previous version.
IMHO nothing in qemu is designed to deal with that many devices/objects
- I'm sure that something other than the migration code is going to get upset.

Is perhaps the structure wrong somewhere - should there be a single DRC
device that knows about all DRCs?

Dave


> https://github.com/qemu/qemu/blob/master/hw/ppc/spapr_drc.c
> 
> They are part of SPAPR specification.
> 
> https://raw.githubusercontent.com/qemu/qemu/master/docs/specs/ppc-spapr-hotplug.txt
> 
> CC Michael Roth
> 
> Thanks,
> Laurent
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-10-18 Thread Laurent Vivier
On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> * Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
>> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
>> sorting is maintained by searching from head to tail for a suitable
>> insertion spot.  Insertion is thus an O(n) operation.
>>
>> If we instead keep track of the head of each priority's subqueue
>> within that larger queue we can reduce this operation to O(1) time.
>>
>> savevm_state_handler_remove() becomes slightly more complex to
>> accomodate these gains: we need to replace the head of a priority's
>> subqueue when removing it.
>>
>> With O(1) insertion, booting VMs with many SaveStateEntry objects is
>> more plausible.  For example, a ppc64 VM with maxmem=8T has 4 such
>> objects to insert.
> 
> Separate from reviewing this patch, I'd like to understand why you've
> got 4 objects.  This feels very very wrong and is likely to cause
> problems to random other bits of qemu as well.

I think the 4 objects are the "dr-connectors" that are used to plug
peripherals (memory, pci card, cpus, ...).

https://github.com/qemu/qemu/blob/master/hw/ppc/spapr_drc.c

They are part of SPAPR specification.

https://raw.githubusercontent.com/qemu/qemu/master/docs/specs/ppc-spapr-hotplug.txt

CC Michael Roth

Thanks,
Laurent



Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion

2019-10-18 Thread Dr. David Alan Gilbert
* Scott Cheloha (chel...@linux.vnet.ibm.com) wrote:
> savevm_state's SaveStateEntry TAILQ is a priority queue.  Priority
> sorting is maintained by searching from head to tail for a suitable
> insertion spot.  Insertion is thus an O(n) operation.
> 
> If we instead keep track of the head of each priority's subqueue
> within that larger queue we can reduce this operation to O(1) time.
> 
> savevm_state_handler_remove() becomes slightly more complex to
> accomodate these gains: we need to replace the head of a priority's
> subqueue when removing it.
> 
> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> more plausible.  For example, a ppc64 VM with maxmem=8T has 4 such
> objects to insert.

Separate from reviewing this patch, I'd like to understand why you've
got 4 objects.  This feels very very wrong and is likely to cause
problems to random other bits of qemu as well.

Dave

> Signed-off-by: Scott Cheloha 
> ---
>  migration/savevm.c | 26 +++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index b2e3b7222a..f7a2d36bba 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -250,6 +250,7 @@ typedef struct SaveStateEntry {
>  
>  typedef struct SaveState {
>  QTAILQ_HEAD(, SaveStateEntry) handlers;
> +SaveStateEntry *handler_pri_head[MIG_PRI_MAX + 1];
>  int global_section_id;
>  uint32_t len;
>  const char *name;
> @@ -261,6 +262,7 @@ typedef struct SaveState {
>  
>  static SaveState savevm_state = {
>  .handlers = QTAILQ_HEAD_INITIALIZER(savevm_state.handlers),
> +.handler_pri_head = { [MIG_PRI_DEFAULT ... MIG_PRI_MAX] = NULL },
>  .global_section_id = 0,
>  };
>  
> @@ -709,24 +711,42 @@ static void savevm_state_handler_insert(SaveStateEntry 
> *nse)
>  {
>  MigrationPriority priority = save_state_priority(nse);
>  SaveStateEntry *se;
> +int i;
>  
>  assert(priority <= MIG_PRI_MAX);
>  
> -QTAILQ_FOREACH(se, _state.handlers, entry) {
> -if (save_state_priority(se) < priority) {
> +for (i = priority - 1; i >= 0; i--) {
> +se = savevm_state.handler_pri_head[i];
> +if (se != NULL) {
> +assert(save_state_priority(se) < priority);
>  break;
>  }
>  }
>  
> -if (se) {
> +if (i >= 0) {
>  QTAILQ_INSERT_BEFORE(se, nse, entry);
>  } else {
>  QTAILQ_INSERT_TAIL(_state.handlers, nse, entry);
>  }
> +
> +if (savevm_state.handler_pri_head[priority] == NULL) {
> +savevm_state.handler_pri_head[priority] = nse;
> +}
>  }
>  
>  static void savevm_state_handler_remove(SaveStateEntry *se)
>  {
> +SaveStateEntry *next;
> +MigrationPriority priority = save_state_priority(se);
> +
> +if (se == savevm_state.handler_pri_head[priority]) {
> +next = QTAILQ_NEXT(se, entry);
> +if (next != NULL && save_state_priority(next) == priority) {
> +savevm_state.handler_pri_head[priority] = next;
> +} else {
> +savevm_state.handler_pri_head[priority] = NULL;
> +}
> +}
>  QTAILQ_REMOVE(_state.handlers, se, entry);
>  }
>  
> -- 
> 2.23.0
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK