We use Fuel for deployment, with a fairly simple network configuration (Controller/Network node are the same) and OpenDaylight as the neutron driver. However, we also have SR-IOV configured for some nics, and there might be something interesting here.
The instance was created with an SR-IOV port, and in the logs I see "Assigning a pci device without numa affinity toinstance 389109a4-540e-48d9-82b1-873b02cb4d31 which has numa topology". Then shortly after creation fails and the hypervisor seems to crash. So today I tried to create an instance without SR-IOV and hw:policy=dedicated, and it worked fine. Then I did the same but added an SR-IOV port, and I get the same crash (though not across all nodes this time...) I assume we have some kind of misconfiguration somewhere, though the entire hypervisor crashing doesn't seem correct either :-) /Tomas On 17 September 2017 at 00:32, Steve Gordon <[email protected]> wrote: > ----- Original Message ----- > > From: "Tomas Brännström" <[email protected]> > > To: [email protected] > > Sent: Friday, September 15, 2017 5:56:34 AM > > Subject: [Openstack] QEMU/KVM crash when mixing cpu_policy:dedicated and > non-dedicated flavors? > > > > Hi > > I just noticed a strange (?) issue when I tried to create an instance > with > > a flavor with hw:cpu_policy=dedicated. The instance failed with error: > > > > Unable to read from monitor: Connection reset by peer', u'code': 500, > > u'details': u' File > > "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, > in > > _do_build_and_run_instance\n filter_properties) > > File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line > 2116, > > in _build_and_run_instance\n instance_uuid=instance.uuid, > > reason=six.text_type(e)) > > > > And all other instances were shut down, even those living on another > > compute host than the new one was scheduled to. A quick googling reveals > > that this could be due to the hypervisor crashing (though why would it > > crash on unrelated compute hosts??). > > Are there any more specific messages in the system logs or elsewhere? > Check /var/log/libvirt/* in particular, though I suspect it will be the > original source of the above message it may have some additional useful > information earlier. > > > > > The only odd thing here that I can think of was that the existing > instances > > did -not- use dedicated cpu policy -- can there be problems like this > when > > attempting to mix dedicated and non-dedicated policies? > > The main problem if you mix them *on the same node* is that Nova wont > account properly for this when placing guests, the current design assumes > that a node will be used either for "normal" instances (with CPU > overcommit) or "dedicated" instances (no CPU overcommit, pinning) and the > two will be separated via the use of host aggregates and flavors. This in > and of itself should not result in a QEMU crash though it may eventually > result in issues w.r.t. balancing of scheduling/placement decisions. If > instances on other nodes went down at the same time I'd be looking for a > broader issue, what is your storage and networking setup like? > > -Steve > > > This was with Mitaka. > > > > /Tomas > > > > _______________________________________________ > > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack > > Post to : [email protected] > > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack > > > > -- > Steve Gordon, > Principal Product Manager, > Red Hat OpenStack Platform >
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : [email protected] Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
