Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS

2017-05-30 Thread David Gibson
On Tue, May 30, 2017 at 12:15:59PM -0500, Michael Roth wrote:
> Quoting David Gibson (2017-05-24 22:16:26)
> > On Wed, May 24, 2017 at 12:40:37PM -0500, Michael Roth wrote:
> > > Quoting Laurent Vivier (2017-05-24 11:02:30)
> > > > On 24/05/2017 17:54, Greg Kurz wrote:
> > > > > On Wed, 24 May 2017 12:14:02 +0200
> > > > > Igor Mammedov  wrote:
> > > > > 
> > > > >> On Wed, 24 May 2017 11:28:57 +0200
> > > > >> Greg Kurz  wrote:
> > > > >>
> > > > >>> On Wed, 24 May 2017 15:07:54 +1000
> > > > >>> David Gibson  wrote:
> > > > >>>   
> > > >  On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:
> > > > > If the OS is not started, QEMU sends an event to the OS
> > > > > that is lost and cannot be recovered. An unplug is not
> > > > > able to restore QEMU in a coherent state.
> > > > > So, while the OS is not started, disable CPU and memory hotplug.
> > > > > We use option vector 6 to know if the OS is started
> > > > >
> > > > > Signed-off-by: Laurent Vivier   
> > > > 
> > > >  Urgh.. I'm not terribly confident that this is really correct.  As
> > > >  discussed on the previous patch, you're essentially using OV6 as a
> > > >  flag that CAS is complete.
> > > > 
> > > >  But while it undoubtedly makes the race window much smaller, I 
> > > >  don't
> > > >  see that there's any guarantee the guest OS will really be able to
> > > >  handle hotplug events immediately after CAS.
> > > > 
> > > >  In particular if the CAS process completes partially but then 
> > > >  needs to
> > > >  trigger a reboot, I think that would end up setting the ov6 
> > > >  variable,
> > > >  but the OS would definitely not be in a state to accept events.  
> > > > >> wouldn't guest on reboot pick up updated fdt and online hotplugged
> > > > >> before crash cpu along with initial cpus?
> > > > >>
> > > > > 
> > > > > Yes and that's what actually happens with cpus.
> > > > > 
> > > > > But catching up with the background for this series, I have the
> > > > > impression that the issue isn't the fact we loose an event if the OS
> > > > > isn't started (which is not true), but more something wrong happening
> > > > > when hotplugging+unplugging memory as described in this commit:
> > > > > 
> > > > > commit fe6824d12642b005c69123ecf8631f9b13553f8b
> > > > > Author: Laurent Vivier 
> > > > > Date:   Tue Mar 28 14:09:34 2017 +0200
> > > > > 
> > > > > spapr: fix memory hot-unplugging
> > > > > 
> > > > 
> > > > Yes, this commit try to fix that, but it's not possible. Some objects
> > > > remain in memory: you can see with "info cpus" or "info memory-devices"
> > > > that they are not really removed, and this prevents to hotplug them
> > > > again, and moreover in the case of the memory hot-unplug we can rerun
> > > > the device_del and crash qemu (as before the fix).
> > > > 
> > > > Moreover all stuff normally cleared in detach() are not, and we can't do
> > > > it later in set_allocation_state() because some are in use by the
> > > > kernel, and this is the last call from the kernel.
> > > 
> > > Focusing on the hotplug/add case, it's a bit odd that the guest would be
> > > using the memory even though the hotplug event is clearly still sitting
> > > in the queue.
> > > 
> > > I think part of the issue is us not having a clear enough distinction in
> > > the code between what constitutes the need for "boot-time" handling vs.
> > > "hotplug" handling.
> > > 
> > > We have this hook in spapr_add_lmbs:
> > > 
> > > if (!dev->hotplugged) {
> > > /* guests expect coldplugged LMBs to be pre-allocated */
> > > drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_USABLE);
> > > drck->set_isolation_state(drc, 
> > > SPAPR_DR_ISOLATION_STATE_UNISOLATED);
> > > }
> > > 
> > > Whereas the default allocation/isolation state for LMBs in spapr_drc.c is
> > > UNUSABLE/ISOLATED, which is what covers the dev->hotplugged == true case.
> > > 
> > > I need to spend some time testing to confirm, but trying to walk through 
> > > the
> > > various scenarios looking at the code:
> > > 
> > > case 1)
> > > 
> > > If the hotplug occurs before reset (not sure how likely this is), the 
> > > event
> > > will get dropped by reset handler, and the DRC stuff will be left in
> > > UNUSABLE/ISOLATED. I think it's more appropriate to treat this as 
> > > "boot-time"
> > > and set it to USABLE/UNISOLATED like the !dev->hotplugged case.
> > 
> > Right.  It looks like we might need to go through all DRCs and sanitize
> > their state at reset time.  Essentially whatever their state before
> > the reset, they should appear as cold-plugged after the reset, I
> > think.
> > 
> > > case 2)
> > > 
> > > If the hotplug it occurs after reset, but before CAS,
> > > spapr_populate_drconf_memory will be called to populate the DT with all 
> > > active
> > > LMBs. AFAICT, for hotplugged LMBs it marks everything where
> > > memory_reg

Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS

2017-05-30 Thread Michael Roth
Quoting David Gibson (2017-05-24 22:16:26)
> On Wed, May 24, 2017 at 12:40:37PM -0500, Michael Roth wrote:
> > Quoting Laurent Vivier (2017-05-24 11:02:30)
> > > On 24/05/2017 17:54, Greg Kurz wrote:
> > > > On Wed, 24 May 2017 12:14:02 +0200
> > > > Igor Mammedov  wrote:
> > > > 
> > > >> On Wed, 24 May 2017 11:28:57 +0200
> > > >> Greg Kurz  wrote:
> > > >>
> > > >>> On Wed, 24 May 2017 15:07:54 +1000
> > > >>> David Gibson  wrote:
> > > >>>   
> > >  On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:
> > > > If the OS is not started, QEMU sends an event to the OS
> > > > that is lost and cannot be recovered. An unplug is not
> > > > able to restore QEMU in a coherent state.
> > > > So, while the OS is not started, disable CPU and memory hotplug.
> > > > We use option vector 6 to know if the OS is started
> > > >
> > > > Signed-off-by: Laurent Vivier   
> > > 
> > >  Urgh.. I'm not terribly confident that this is really correct.  As
> > >  discussed on the previous patch, you're essentially using OV6 as a
> > >  flag that CAS is complete.
> > > 
> > >  But while it undoubtedly makes the race window much smaller, I don't
> > >  see that there's any guarantee the guest OS will really be able to
> > >  handle hotplug events immediately after CAS.
> > > 
> > >  In particular if the CAS process completes partially but then needs 
> > >  to
> > >  trigger a reboot, I think that would end up setting the ov6 variable,
> > >  but the OS would definitely not be in a state to accept events.  
> > > >> wouldn't guest on reboot pick up updated fdt and online hotplugged
> > > >> before crash cpu along with initial cpus?
> > > >>
> > > > 
> > > > Yes and that's what actually happens with cpus.
> > > > 
> > > > But catching up with the background for this series, I have the
> > > > impression that the issue isn't the fact we loose an event if the OS
> > > > isn't started (which is not true), but more something wrong happening
> > > > when hotplugging+unplugging memory as described in this commit:
> > > > 
> > > > commit fe6824d12642b005c69123ecf8631f9b13553f8b
> > > > Author: Laurent Vivier 
> > > > Date:   Tue Mar 28 14:09:34 2017 +0200
> > > > 
> > > > spapr: fix memory hot-unplugging
> > > > 
> > > 
> > > Yes, this commit try to fix that, but it's not possible. Some objects
> > > remain in memory: you can see with "info cpus" or "info memory-devices"
> > > that they are not really removed, and this prevents to hotplug them
> > > again, and moreover in the case of the memory hot-unplug we can rerun
> > > the device_del and crash qemu (as before the fix).
> > > 
> > > Moreover all stuff normally cleared in detach() are not, and we can't do
> > > it later in set_allocation_state() because some are in use by the
> > > kernel, and this is the last call from the kernel.
> > 
> > Focusing on the hotplug/add case, it's a bit odd that the guest would be
> > using the memory even though the hotplug event is clearly still sitting
> > in the queue.
> > 
> > I think part of the issue is us not having a clear enough distinction in
> > the code between what constitutes the need for "boot-time" handling vs.
> > "hotplug" handling.
> > 
> > We have this hook in spapr_add_lmbs:
> > 
> > if (!dev->hotplugged) {
> > /* guests expect coldplugged LMBs to be pre-allocated */
> > drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_USABLE);
> > drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_UNISOLATED);
> > }
> > 
> > Whereas the default allocation/isolation state for LMBs in spapr_drc.c is
> > UNUSABLE/ISOLATED, which is what covers the dev->hotplugged == true case.
> > 
> > I need to spend some time testing to confirm, but trying to walk through the
> > various scenarios looking at the code:
> > 
> > case 1)
> > 
> > If the hotplug occurs before reset (not sure how likely this is), the event
> > will get dropped by reset handler, and the DRC stuff will be left in
> > UNUSABLE/ISOLATED. I think it's more appropriate to treat this as 
> > "boot-time"
> > and set it to USABLE/UNISOLATED like the !dev->hotplugged case.
> 
> Right.  It looks like we might need to go through all DRCs and sanitize
> their state at reset time.  Essentially whatever their state before
> the reset, they should appear as cold-plugged after the reset, I
> think.
> 
> > case 2)
> > 
> > If the hotplug it occurs after reset, but before CAS,
> > spapr_populate_drconf_memory will be called to populate the DT with all 
> > active
> > LMBs. AFAICT, for hotplugged LMBs it marks everything where
> > memory_region_preset(get_system_memory(), addr) == true as
> > SPAPR_LMB_FLAGS_ASSIGNED. Since the region is mapped regardless of whether 
> > the
> > guest has acknowledged the hotplug, I think this would end up presenting the
> > LMB as having been present at boot-time. However, they will still be in the
> > UNUSABLE/ISOLATED

Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS

2017-05-24 Thread David Gibson
On Wed, May 24, 2017 at 12:14:02PM +0200, Igor Mammedov wrote:
> On Wed, 24 May 2017 11:28:57 +0200
> Greg Kurz  wrote:
> 
> > On Wed, 24 May 2017 15:07:54 +1000
> > David Gibson  wrote:
> > 
> > > On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:  
> > > > If the OS is not started, QEMU sends an event to the OS
> > > > that is lost and cannot be recovered. An unplug is not
> > > > able to restore QEMU in a coherent state.
> > > > So, while the OS is not started, disable CPU and memory hotplug.
> > > > We use option vector 6 to know if the OS is started
> > > > 
> > > > Signed-off-by: Laurent Vivier 
> > > 
> > > Urgh.. I'm not terribly confident that this is really correct.  As
> > > discussed on the previous patch, you're essentially using OV6 as a
> > > flag that CAS is complete.
> > > 
> > > But while it undoubtedly makes the race window much smaller, I don't
> > > see that there's any guarantee the guest OS will really be able to
> > > handle hotplug events immediately after CAS.
> > > 
> > > In particular if the CAS process completes partially but then needs to
> > > trigger a reboot, I think that would end up setting the ov6 variable,
> > > but the OS would definitely not be in a state to accept events.
> wouldn't guest on reboot pick up updated fdt and online hotplugged
> before crash cpu along with initial cpus?

Ah.. yes, I guess so.  Are we already resetting DRC and pending event
state at reset?

> 
> > We never have any guarantee that the OS will process an event that
> > we've sent actually (think of a kernel crash just after a successful
> > CAS negotiation for example, or any failure with the various guest
> > components involved in the process of hotplug).
> > 
> > > Mike, I really think we need some input from someone familiar with how
> > > these hotplug events are supposed to work.  What do we need to do to
> > > handle lost or stale events, such as those delivered when an OS is not
> > > booted.
> > >   
> > 
> > AFAIK, in the PowerVM world, the HMC exposes a user configurable timeout.
> > 
> > https://www.ibm.com/support/knowledgecenter/POWER8/p8hat/p8hat_dlparprocpoweraddp6.htm
> > 
> > I'm not sure we can do anything better than being able to "cancel" a 
> > previous
> > hotplug attempt if it takes too long, but I'm not necessarily the expert 
> > you're
> > looking for :)
> From x86/ACPI world:
>  - if hotplug happens early at boot before guest OS is running
>hotplug notification (SCI interrupt) stays pending and once guest
>is up it will/should handle it and online CPU

Yeah.. I'm not sure how this will play out on pseries, I suspect the
problem is here.

>  - if guest crashed and is rebooted it will pickup updated apci tables (fdt 
> equivalent)
>with all present cpus (including hotplugged one before crash) and online
>hotplugged cpu along with coldplugged ones

I think that should work ok for us, as long as we're properly
resetting in-flight hotplug state at a reset.

>  - if guest looses SCI somehow, it's considered guest issue and such cpu
>stays unpluggable until guest picks it somehow (reboot, manually running 
> cpus scan
>method from ACPI or another cpu hotplug event) and explicitly ejects it.
> 
> Taking in account that CPUs don't support surprise removal and requires
> guest cooperation it's fine to leave CPU plugged in until guest ejects it.
> That's what I'd expect to happen on baremetal, 
> you hotplug CPU, hardware notifies OS about it and that's all,
> cpu won't suddenly pop out if OS isn't able to online it.
> 
> More over that hotplugged cpu might be executing some code or one of
> already present cpus might be executing initialization routines to online
> it (think of host overcommit and arbitrary delays) so it is not really safe
> to remove hotplugged but not onlined cpu without OS consent
> (i.e. explicit eject by OS/firmware). I think the lost event handling should 
> be
> fixed on guest side and not in QEMU.

I agree in principle, but it's not yet clear what needs to be done.
I'm guessing the problem is amounting to lost events, but I'm not
certain.  The question is does the mechanism we're using to present
the events have a means to safely not lose them.  Are they being
presented and lost during SLOF; is there some way we can prevent that.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS

2017-05-24 Thread David Gibson
On Wed, May 24, 2017 at 11:28:57AM +0200, Greg Kurz wrote:
> On Wed, 24 May 2017 15:07:54 +1000
> David Gibson  wrote:
> 
> > On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:
> > > If the OS is not started, QEMU sends an event to the OS
> > > that is lost and cannot be recovered. An unplug is not
> > > able to restore QEMU in a coherent state.
> > > So, while the OS is not started, disable CPU and memory hotplug.
> > > We use option vector 6 to know if the OS is started
> > > 
> > > Signed-off-by: Laurent Vivier   
> > 
> > Urgh.. I'm not terribly confident that this is really correct.  As
> > discussed on the previous patch, you're essentially using OV6 as a
> > flag that CAS is complete.
> > 
> > But while it undoubtedly makes the race window much smaller, I don't
> > see that there's any guarantee the guest OS will really be able to
> > handle hotplug events immediately after CAS.
> > 
> > In particular if the CAS process completes partially but then needs to
> > trigger a reboot, I think that would end up setting the ov6 variable,
> > but the OS would definitely not be in a state to accept events.
> > 
> 
> We never have any guarantee that the OS will process an event that
> we've sent actually (think of a kernel crash just after a successful
> CAS negotiation for example, or any failure with the various guest
> components involved in the process of hotplug).
> 
> > Mike, I really think we need some input from someone familiar with how
> > these hotplug events are supposed to work.  What do we need to do to
> > handle lost or stale events, such as those delivered when an OS is not
> > booted.
> > 
> 
> AFAIK, in the PowerVM world, the HMC exposes a user configurable timeout.
> 
> https://www.ibm.com/support/knowledgecenter/POWER8/p8hat/p8hat_dlparprocpoweraddp6.htm
> 
> I'm not sure we can do anything better than being able to "cancel" a previous
> hotplug attempt if it takes too long, but I'm not necessarily the expert 
> you're
> looking for :)

Right, but at the moment we *don't* have a way to cancel a previous
hotplug attempt.  Trying to remove again ends up with things in a
tangle.

> 
> > > ---
> > >  hw/ppc/spapr.c | 22 +++---
> > >  1 file changed, 19 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index eceb4cc..2e9320d 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -2625,6 +2625,7 @@ out:
> > >  static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, 
> > > DeviceState *dev,
> > >  Error **errp)
> > >  {
> > > +sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
> > >  PCDIMMDevice *dimm = PC_DIMM(dev);
> > >  PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> > >  MemoryRegion *mr = ddc->get_memory_region(dimm);
> > > @@ -2645,6 +2646,13 @@ static void spapr_memory_pre_plug(HotplugHandler 
> > > *hotplug_dev, DeviceState *dev,
> > >  goto out;
> > >  }
> > >  
> > > +if (dev->hotplugged) {
> > > +if (!ms->os_name) {
> > > +error_setg(&local_err, "Memory hotplug not supported without 
> > > OS");
> > > +goto out;
> > > +}
> > > +}
> > > +
> > >  out:
> > >  error_propagate(errp, local_err);
> > >  }
> > > @@ -2874,6 +2882,7 @@ static void spapr_core_pre_plug(HotplugHandler 
> > > *hotplug_dev, DeviceState *dev,
> > >  Error **errp)
> > >  {
> > >  MachineState *machine = MACHINE(OBJECT(hotplug_dev));
> > > +sPAPRMachineState *ms = SPAPR_MACHINE(machine);
> > >  MachineClass *mc = MACHINE_GET_CLASS(hotplug_dev);
> > >  Error *local_err = NULL;
> > >  CPUCore *cc = CPU_CORE(dev);
> > > @@ -2884,9 +2893,16 @@ static void spapr_core_pre_plug(HotplugHandler 
> > > *hotplug_dev, DeviceState *dev,
> > >  int node_id;
> > >  int index;
> > >  
> > > -if (dev->hotplugged && !mc->has_hotpluggable_cpus) {
> > > -error_setg(&local_err, "CPU hotplug not supported for this 
> > > machine");
> > > -goto out;
> > > +if (dev->hotplugged) {
> > > +if (!mc->has_hotpluggable_cpus) {
> > > +error_setg(&local_err,
> > > +   "CPU hotplug not supported for this machine");
> > > +goto out;
> > > +}
> > > +if (!ms->os_name) {
> > > +error_setg(&local_err, "CPU hotplug not supported without 
> > > OS");
> > > +goto out;
> > > +}
> > >  }
> > >  
> > >  if (strcmp(base_core_type, type)) {  
> > 
> 



-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS

2017-05-24 Thread David Gibson
On Wed, May 24, 2017 at 12:40:37PM -0500, Michael Roth wrote:
> Quoting Laurent Vivier (2017-05-24 11:02:30)
> > On 24/05/2017 17:54, Greg Kurz wrote:
> > > On Wed, 24 May 2017 12:14:02 +0200
> > > Igor Mammedov  wrote:
> > > 
> > >> On Wed, 24 May 2017 11:28:57 +0200
> > >> Greg Kurz  wrote:
> > >>
> > >>> On Wed, 24 May 2017 15:07:54 +1000
> > >>> David Gibson  wrote:
> > >>>   
> >  On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:
> > > If the OS is not started, QEMU sends an event to the OS
> > > that is lost and cannot be recovered. An unplug is not
> > > able to restore QEMU in a coherent state.
> > > So, while the OS is not started, disable CPU and memory hotplug.
> > > We use option vector 6 to know if the OS is started
> > >
> > > Signed-off-by: Laurent Vivier   
> > 
> >  Urgh.. I'm not terribly confident that this is really correct.  As
> >  discussed on the previous patch, you're essentially using OV6 as a
> >  flag that CAS is complete.
> > 
> >  But while it undoubtedly makes the race window much smaller, I don't
> >  see that there's any guarantee the guest OS will really be able to
> >  handle hotplug events immediately after CAS.
> > 
> >  In particular if the CAS process completes partially but then needs to
> >  trigger a reboot, I think that would end up setting the ov6 variable,
> >  but the OS would definitely not be in a state to accept events.  
> > >> wouldn't guest on reboot pick up updated fdt and online hotplugged
> > >> before crash cpu along with initial cpus?
> > >>
> > > 
> > > Yes and that's what actually happens with cpus.
> > > 
> > > But catching up with the background for this series, I have the
> > > impression that the issue isn't the fact we loose an event if the OS
> > > isn't started (which is not true), but more something wrong happening
> > > when hotplugging+unplugging memory as described in this commit:
> > > 
> > > commit fe6824d12642b005c69123ecf8631f9b13553f8b
> > > Author: Laurent Vivier 
> > > Date:   Tue Mar 28 14:09:34 2017 +0200
> > > 
> > > spapr: fix memory hot-unplugging
> > > 
> > 
> > Yes, this commit try to fix that, but it's not possible. Some objects
> > remain in memory: you can see with "info cpus" or "info memory-devices"
> > that they are not really removed, and this prevents to hotplug them
> > again, and moreover in the case of the memory hot-unplug we can rerun
> > the device_del and crash qemu (as before the fix).
> > 
> > Moreover all stuff normally cleared in detach() are not, and we can't do
> > it later in set_allocation_state() because some are in use by the
> > kernel, and this is the last call from the kernel.
> 
> Focusing on the hotplug/add case, it's a bit odd that the guest would be
> using the memory even though the hotplug event is clearly still sitting
> in the queue.
> 
> I think part of the issue is us not having a clear enough distinction in
> the code between what constitutes the need for "boot-time" handling vs.
> "hotplug" handling.
> 
> We have this hook in spapr_add_lmbs:
> 
> if (!dev->hotplugged) {
> /* guests expect coldplugged LMBs to be pre-allocated */
> drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_USABLE);
> drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_UNISOLATED);
> }
> 
> Whereas the default allocation/isolation state for LMBs in spapr_drc.c is
> UNUSABLE/ISOLATED, which is what covers the dev->hotplugged == true case.
> 
> I need to spend some time testing to confirm, but trying to walk through the
> various scenarios looking at the code:
> 
> case 1)
> 
> If the hotplug occurs before reset (not sure how likely this is), the event
> will get dropped by reset handler, and the DRC stuff will be left in
> UNUSABLE/ISOLATED. I think it's more appropriate to treat this as "boot-time"
> and set it to USABLE/UNISOLATED like the !dev->hotplugged case.

Right.  It looks like we might need to go through all DRCs and sanitize
their state at reset time.  Essentially whatever their state before
the reset, they should appear as cold-plugged after the reset, I
think.

> case 2)
> 
> If the hotplug it occurs after reset, but before CAS,
> spapr_populate_drconf_memory will be called to populate the DT with all active
> LMBs. AFAICT, for hotplugged LMBs it marks everything where
> memory_region_preset(get_system_memory(), addr) == true as
> SPAPR_LMB_FLAGS_ASSIGNED. Since the region is mapped regardless of whether the
> guest has acknowledged the hotplug, I think this would end up presenting the
> LMB as having been present at boot-time. However, they will still be in the
> UNUSABLE/ISOLATED state because dev->hotplugged == true.
> 
> I would think that the delayed hotplug event would move them to the 
> appropriate
> state later, allowing the unplug to succeed later, but it totally possible the
> guest code bails out during the hotplug path since it already has t

Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS

2017-05-24 Thread Michael Roth
Quoting Laurent Vivier (2017-05-24 11:02:30)
> On 24/05/2017 17:54, Greg Kurz wrote:
> > On Wed, 24 May 2017 12:14:02 +0200
> > Igor Mammedov  wrote:
> > 
> >> On Wed, 24 May 2017 11:28:57 +0200
> >> Greg Kurz  wrote:
> >>
> >>> On Wed, 24 May 2017 15:07:54 +1000
> >>> David Gibson  wrote:
> >>>   
>  On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:
> > If the OS is not started, QEMU sends an event to the OS
> > that is lost and cannot be recovered. An unplug is not
> > able to restore QEMU in a coherent state.
> > So, while the OS is not started, disable CPU and memory hotplug.
> > We use option vector 6 to know if the OS is started
> >
> > Signed-off-by: Laurent Vivier   
> 
>  Urgh.. I'm not terribly confident that this is really correct.  As
>  discussed on the previous patch, you're essentially using OV6 as a
>  flag that CAS is complete.
> 
>  But while it undoubtedly makes the race window much smaller, I don't
>  see that there's any guarantee the guest OS will really be able to
>  handle hotplug events immediately after CAS.
> 
>  In particular if the CAS process completes partially but then needs to
>  trigger a reboot, I think that would end up setting the ov6 variable,
>  but the OS would definitely not be in a state to accept events.  
> >> wouldn't guest on reboot pick up updated fdt and online hotplugged
> >> before crash cpu along with initial cpus?
> >>
> > 
> > Yes and that's what actually happens with cpus.
> > 
> > But catching up with the background for this series, I have the
> > impression that the issue isn't the fact we loose an event if the OS
> > isn't started (which is not true), but more something wrong happening
> > when hotplugging+unplugging memory as described in this commit:
> > 
> > commit fe6824d12642b005c69123ecf8631f9b13553f8b
> > Author: Laurent Vivier 
> > Date:   Tue Mar 28 14:09:34 2017 +0200
> > 
> > spapr: fix memory hot-unplugging
> > 
> 
> Yes, this commit try to fix that, but it's not possible. Some objects
> remain in memory: you can see with "info cpus" or "info memory-devices"
> that they are not really removed, and this prevents to hotplug them
> again, and moreover in the case of the memory hot-unplug we can rerun
> the device_del and crash qemu (as before the fix).
> 
> Moreover all stuff normally cleared in detach() are not, and we can't do
> it later in set_allocation_state() because some are in use by the
> kernel, and this is the last call from the kernel.

Focusing on the hotplug/add case, it's a bit odd that the guest would be
using the memory even though the hotplug event is clearly still sitting
in the queue.

I think part of the issue is us not having a clear enough distinction in
the code between what constitutes the need for "boot-time" handling vs.
"hotplug" handling.

We have this hook in spapr_add_lmbs:

if (!dev->hotplugged) {
/* guests expect coldplugged LMBs to be pre-allocated */
drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_USABLE);
drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_UNISOLATED);
}

Whereas the default allocation/isolation state for LMBs in spapr_drc.c is
UNUSABLE/ISOLATED, which is what covers the dev->hotplugged == true case.

I need to spend some time testing to confirm, but trying to walk through the
various scenarios looking at the code:

case 1)

If the hotplug occurs before reset (not sure how likely this is), the event
will get dropped by reset handler, and the DRC stuff will be left in
UNUSABLE/ISOLATED. I think it's more appropriate to treat this as "boot-time"
and set it to USABLE/UNISOLATED like the !dev->hotplugged case.

case 2)

If the hotplug it occurs after reset, but before CAS,
spapr_populate_drconf_memory will be called to populate the DT with all active
LMBs. AFAICT, for hotplugged LMBs it marks everything where
memory_region_preset(get_system_memory(), addr) == true as
SPAPR_LMB_FLAGS_ASSIGNED. Since the region is mapped regardless of whether the
guest has acknowledged the hotplug, I think this would end up presenting the
LMB as having been present at boot-time. However, they will still be in the
UNUSABLE/ISOLATED state because dev->hotplugged == true.

I would think that the delayed hotplug event would move them to the appropriate
state later, allowing the unplug to succeed later, but it totally possible the
guest code bails out during the hotplug path since it already has the LMB marked
as being in use via the CAS-generated DT.

So it seems like we need to either:

a) not mark these LMBs as SPAPR_LMB_FLAGS_ASSIGNED in the DT and let them get
picked up by the deferred hotplug event (which seems to also be in need of an
extra IRQ pulse given that it's not getting picked up till later), or

b) let them get picked up as boot-time LMBs and add a CAS hook to move the
state to USABLE/UNISOLATED at that point. optionally we could also purge any
pending

Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS

2017-05-24 Thread Laurent Vivier
On 24/05/2017 17:54, Greg Kurz wrote:
> On Wed, 24 May 2017 12:14:02 +0200
> Igor Mammedov  wrote:
> 
>> On Wed, 24 May 2017 11:28:57 +0200
>> Greg Kurz  wrote:
>>
>>> On Wed, 24 May 2017 15:07:54 +1000
>>> David Gibson  wrote:
>>>   
 On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:
> If the OS is not started, QEMU sends an event to the OS
> that is lost and cannot be recovered. An unplug is not
> able to restore QEMU in a coherent state.
> So, while the OS is not started, disable CPU and memory hotplug.
> We use option vector 6 to know if the OS is started
>
> Signed-off-by: Laurent Vivier   

 Urgh.. I'm not terribly confident that this is really correct.  As
 discussed on the previous patch, you're essentially using OV6 as a
 flag that CAS is complete.

 But while it undoubtedly makes the race window much smaller, I don't
 see that there's any guarantee the guest OS will really be able to
 handle hotplug events immediately after CAS.

 In particular if the CAS process completes partially but then needs to
 trigger a reboot, I think that would end up setting the ov6 variable,
 but the OS would definitely not be in a state to accept events.  
>> wouldn't guest on reboot pick up updated fdt and online hotplugged
>> before crash cpu along with initial cpus?
>>
> 
> Yes and that's what actually happens with cpus.
> 
> But catching up with the background for this series, I have the
> impression that the issue isn't the fact we loose an event if the OS
> isn't started (which is not true), but more something wrong happening
> when hotplugging+unplugging memory as described in this commit:
> 
> commit fe6824d12642b005c69123ecf8631f9b13553f8b
> Author: Laurent Vivier 
> Date:   Tue Mar 28 14:09:34 2017 +0200
> 
> spapr: fix memory hot-unplugging
> 

Yes, this commit try to fix that, but it's not possible. Some objects
remain in memory: you can see with "info cpus" or "info memory-devices"
that they are not really removed, and this prevents to hotplug them
again, and moreover in the case of the memory hot-unplug we can rerun
the device_del and crash qemu (as before the fix).

Moreover all stuff normally cleared in detach() are not, and we can't do
it later in set_allocation_state() because some are in use by the
kernel, and this is the last call from the kernel.

Laurent



Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS

2017-05-24 Thread Greg Kurz
On Wed, 24 May 2017 12:14:02 +0200
Igor Mammedov  wrote:

> On Wed, 24 May 2017 11:28:57 +0200
> Greg Kurz  wrote:
> 
> > On Wed, 24 May 2017 15:07:54 +1000
> > David Gibson  wrote:
> >   
> > > On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:
> > > > If the OS is not started, QEMU sends an event to the OS
> > > > that is lost and cannot be recovered. An unplug is not
> > > > able to restore QEMU in a coherent state.
> > > > So, while the OS is not started, disable CPU and memory hotplug.
> > > > We use option vector 6 to know if the OS is started
> > > > 
> > > > Signed-off-by: Laurent Vivier   
> > > 
> > > Urgh.. I'm not terribly confident that this is really correct.  As
> > > discussed on the previous patch, you're essentially using OV6 as a
> > > flag that CAS is complete.
> > > 
> > > But while it undoubtedly makes the race window much smaller, I don't
> > > see that there's any guarantee the guest OS will really be able to
> > > handle hotplug events immediately after CAS.
> > > 
> > > In particular if the CAS process completes partially but then needs to
> > > trigger a reboot, I think that would end up setting the ov6 variable,
> > > but the OS would definitely not be in a state to accept events.  
> wouldn't guest on reboot pick up updated fdt and online hotplugged
> before crash cpu along with initial cpus?
> 

Yes and that's what actually happens with cpus.

But catching up with the background for this series, I have the
impression that the issue isn't the fact we loose an event if the OS
isn't started (which is not true), but more something wrong happening
when hotplugging+unplugging memory as described in this commit:

commit fe6824d12642b005c69123ecf8631f9b13553f8b
Author: Laurent Vivier 
Date:   Tue Mar 28 14:09:34 2017 +0200

spapr: fix memory hot-unplugging

> > We never have any guarantee that the OS will process an event that
> > we've sent actually (think of a kernel crash just after a successful
> > CAS negotiation for example, or any failure with the various guest
> > components involved in the process of hotplug).
> >   
> > > Mike, I really think we need some input from someone familiar with how
> > > these hotplug events are supposed to work.  What do we need to do to
> > > handle lost or stale events, such as those delivered when an OS is not
> > > booted.
> > > 
> > 
> > AFAIK, in the PowerVM world, the HMC exposes a user configurable timeout.
> > 
> > https://www.ibm.com/support/knowledgecenter/POWER8/p8hat/p8hat_dlparprocpoweraddp6.htm
> > 
> > I'm not sure we can do anything better than being able to "cancel" a 
> > previous
> > hotplug attempt if it takes too long, but I'm not necessarily the expert 
> > you're
> > looking for :)  
> From x86/ACPI world:
>  - if hotplug happens early at boot before guest OS is running
>hotplug notification (SCI interrupt) stays pending and once guest
>is up it will/should handle it and online CPU
>  - if guest crashed and is rebooted it will pickup updated apci tables (fdt 
> equivalent)
>with all present cpus (including hotplugged one before crash) and online
>hotplugged cpu along with coldplugged ones
>  - if guest looses SCI somehow, it's considered guest issue and such cpu
>stays unpluggable until guest picks it somehow (reboot, manually running 
> cpus scan
>method from ACPI or another cpu hotplug event) and explicitly ejects it.
> 
> Taking in account that CPUs don't support surprise removal and requires
> guest cooperation it's fine to leave CPU plugged in until guest ejects it.
> That's what I'd expect to happen on baremetal, 
> you hotplug CPU, hardware notifies OS about it and that's all,
> cpu won't suddenly pop out if OS isn't able to online it.
> 
> More over that hotplugged cpu might be executing some code or one of
> already present cpus might be executing initialization routines to online
> it (think of host overcommit and arbitrary delays) so it is not really safe
> to remove hotplugged but not onlined cpu without OS consent
> (i.e. explicit eject by OS/firmware). I think the lost event handling should 
> be
> fixed on guest side and not in QEMU.
> 
> 



pgpLOqndisquu.pgp
Description: OpenPGP digital signature


Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS

2017-05-24 Thread Igor Mammedov
On Wed, 24 May 2017 11:28:57 +0200
Greg Kurz  wrote:

> On Wed, 24 May 2017 15:07:54 +1000
> David Gibson  wrote:
> 
> > On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:  
> > > If the OS is not started, QEMU sends an event to the OS
> > > that is lost and cannot be recovered. An unplug is not
> > > able to restore QEMU in a coherent state.
> > > So, while the OS is not started, disable CPU and memory hotplug.
> > > We use option vector 6 to know if the OS is started
> > > 
> > > Signed-off-by: Laurent Vivier 
> > 
> > Urgh.. I'm not terribly confident that this is really correct.  As
> > discussed on the previous patch, you're essentially using OV6 as a
> > flag that CAS is complete.
> > 
> > But while it undoubtedly makes the race window much smaller, I don't
> > see that there's any guarantee the guest OS will really be able to
> > handle hotplug events immediately after CAS.
> > 
> > In particular if the CAS process completes partially but then needs to
> > trigger a reboot, I think that would end up setting the ov6 variable,
> > but the OS would definitely not be in a state to accept events.
wouldn't guest on reboot pick up updated fdt and online hotplugged
before crash cpu along with initial cpus?

> We never have any guarantee that the OS will process an event that
> we've sent actually (think of a kernel crash just after a successful
> CAS negotiation for example, or any failure with the various guest
> components involved in the process of hotplug).
> 
> > Mike, I really think we need some input from someone familiar with how
> > these hotplug events are supposed to work.  What do we need to do to
> > handle lost or stale events, such as those delivered when an OS is not
> > booted.
> >   
> 
> AFAIK, in the PowerVM world, the HMC exposes a user configurable timeout.
> 
> https://www.ibm.com/support/knowledgecenter/POWER8/p8hat/p8hat_dlparprocpoweraddp6.htm
> 
> I'm not sure we can do anything better than being able to "cancel" a previous
> hotplug attempt if it takes too long, but I'm not necessarily the expert 
> you're
> looking for :)
From x86/ACPI world:
 - if hotplug happens early at boot before guest OS is running
   hotplug notification (SCI interrupt) stays pending and once guest
   is up it will/should handle it and online CPU
 - if guest crashed and is rebooted it will pickup updated apci tables (fdt 
equivalent)
   with all present cpus (including hotplugged one before crash) and online
   hotplugged cpu along with coldplugged ones
 - if guest looses SCI somehow, it's considered guest issue and such cpu
   stays unpluggable until guest picks it somehow (reboot, manually running 
cpus scan
   method from ACPI or another cpu hotplug event) and explicitly ejects it.

Taking in account that CPUs don't support surprise removal and requires
guest cooperation it's fine to leave CPU plugged in until guest ejects it.
That's what I'd expect to happen on baremetal, 
you hotplug CPU, hardware notifies OS about it and that's all,
cpu won't suddenly pop out if OS isn't able to online it.

More over that hotplugged cpu might be executing some code or one of
already present cpus might be executing initialization routines to online
it (think of host overcommit and arbitrary delays) so it is not really safe
to remove hotplugged but not onlined cpu without OS consent
(i.e. explicit eject by OS/firmware). I think the lost event handling should be
fixed on guest side and not in QEMU.




Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS

2017-05-24 Thread Greg Kurz
On Wed, 24 May 2017 15:07:54 +1000
David Gibson  wrote:

> On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:
> > If the OS is not started, QEMU sends an event to the OS
> > that is lost and cannot be recovered. An unplug is not
> > able to restore QEMU in a coherent state.
> > So, while the OS is not started, disable CPU and memory hotplug.
> > We use option vector 6 to know if the OS is started
> > 
> > Signed-off-by: Laurent Vivier   
> 
> Urgh.. I'm not terribly confident that this is really correct.  As
> discussed on the previous patch, you're essentially using OV6 as a
> flag that CAS is complete.
> 
> But while it undoubtedly makes the race window much smaller, I don't
> see that there's any guarantee the guest OS will really be able to
> handle hotplug events immediately after CAS.
> 
> In particular if the CAS process completes partially but then needs to
> trigger a reboot, I think that would end up setting the ov6 variable,
> but the OS would definitely not be in a state to accept events.
> 

We never have any guarantee that the OS will process an event that
we've sent actually (think of a kernel crash just after a successful
CAS negotiation for example, or any failure with the various guest
components involved in the process of hotplug).

> Mike, I really think we need some input from someone familiar with how
> these hotplug events are supposed to work.  What do we need to do to
> handle lost or stale events, such as those delivered when an OS is not
> booted.
> 

AFAIK, in the PowerVM world, the HMC exposes a user configurable timeout.

https://www.ibm.com/support/knowledgecenter/POWER8/p8hat/p8hat_dlparprocpoweraddp6.htm

I'm not sure we can do anything better than being able to "cancel" a previous
hotplug attempt if it takes too long, but I'm not necessarily the expert you're
looking for :)

> > ---
> >  hw/ppc/spapr.c | 22 +++---
> >  1 file changed, 19 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index eceb4cc..2e9320d 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2625,6 +2625,7 @@ out:
> >  static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState 
> > *dev,
> >  Error **errp)
> >  {
> > +sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
> >  PCDIMMDevice *dimm = PC_DIMM(dev);
> >  PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> >  MemoryRegion *mr = ddc->get_memory_region(dimm);
> > @@ -2645,6 +2646,13 @@ static void spapr_memory_pre_plug(HotplugHandler 
> > *hotplug_dev, DeviceState *dev,
> >  goto out;
> >  }
> >  
> > +if (dev->hotplugged) {
> > +if (!ms->os_name) {
> > +error_setg(&local_err, "Memory hotplug not supported without 
> > OS");
> > +goto out;
> > +}
> > +}
> > +
> >  out:
> >  error_propagate(errp, local_err);
> >  }
> > @@ -2874,6 +2882,7 @@ static void spapr_core_pre_plug(HotplugHandler 
> > *hotplug_dev, DeviceState *dev,
> >  Error **errp)
> >  {
> >  MachineState *machine = MACHINE(OBJECT(hotplug_dev));
> > +sPAPRMachineState *ms = SPAPR_MACHINE(machine);
> >  MachineClass *mc = MACHINE_GET_CLASS(hotplug_dev);
> >  Error *local_err = NULL;
> >  CPUCore *cc = CPU_CORE(dev);
> > @@ -2884,9 +2893,16 @@ static void spapr_core_pre_plug(HotplugHandler 
> > *hotplug_dev, DeviceState *dev,
> >  int node_id;
> >  int index;
> >  
> > -if (dev->hotplugged && !mc->has_hotpluggable_cpus) {
> > -error_setg(&local_err, "CPU hotplug not supported for this 
> > machine");
> > -goto out;
> > +if (dev->hotplugged) {
> > +if (!mc->has_hotpluggable_cpus) {
> > +error_setg(&local_err,
> > +   "CPU hotplug not supported for this machine");
> > +goto out;
> > +}
> > +if (!ms->os_name) {
> > +error_setg(&local_err, "CPU hotplug not supported without OS");
> > +goto out;
> > +}
> >  }
> >  
> >  if (strcmp(base_core_type, type)) {  
> 



pgpvqcoIvgmxW.pgp
Description: OpenPGP digital signature