Re: [Openstack-operators] Guest crash and KVM unhandled rdmsr

2017-10-18 Thread Blair Bethwaite
Hi Saverio,

On 13 October 2017 at 09:05, Saverio Proto  wrote:
> I found this link in my browser history:
> https://bugs.launchpad.net/ubuntu/+source/kvm/+bug/1583819

Thanks. Yes, have seen that one too.

> Is it the same messages that you are seeing in Xenial ?

There are a handful of different MSRs mentioned, and these are not
always the same across each "burst" of unhandled rdmsr log messages.

After a bit of further head-scratching we think the crashes are
actually occurring due to kernel panics following SLUB memory
allocation failures related to heavy Lustre workloads. We've now
started patching Lustre clients for that and seeing the guest crash
stop. At this point I don't have any idea why this seems to correspond
with rdmsr attempts though (and actually it looks like the rdmsr
happens just prior to the SLUB failures).

-- 
Cheers,
~Blairo

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [ironic][nova][libvirt] Adding ironic to already-existing kvm deployment

2017-10-18 Thread Chris Friesen

On 10/18/2017 11:37 AM, Chris Apsey wrote:

All,

I'm working to add baremetal provisioning to an already-existing libvirt (kvm)
deployment.  I was under the impression that our currently-existing endpoints
that already run nova-conductor/nova-scheduler/etc. can be modified to support
both kvm and ironic, but after looking at the ironic installation guide
(https://docs.openstack.org/ironic/latest/install/configure-compute.html), this
doesn't appear to be the case.  Changes are made in the [default] section that
you obviously wouldn't want to apply to your virtual instances.

Given that information, it would appear that ironic requires that you create an
additional host to run nova-compute separately from your already-existing
compute nodes purely for the purpose of managing the ironic-nova integration,
which makes sense.


I think you could run nova-compute with a config file specified as part of the 
commandline.  From what I understand if you run it on the same host as the 
libvirt nova-compute you'd need to use a separate hostname for running the 
ironic nova-compute since nova uses the binary/hostname tuple to uniquely 
identify services in the DB.


Chris

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [ironic][nova][libvirt] Adding ironic to already-existing kvm deployment

2017-10-18 Thread Chris Apsey

All,

I'm working to add baremetal provisioning to an already-existing libvirt 
(kvm) deployment.  I was under the impression that our 
currently-existing endpoints that already run 
nova-conductor/nova-scheduler/etc. can be modified to support both kvm 
and ironic, but after looking at the ironic installation guide 
(https://docs.openstack.org/ironic/latest/install/configure-compute.html), 
this doesn't appear to be the case.  Changes are made in the [default] 
section that you obviously wouldn't want to apply to your virtual 
instances.


Given that information, it would appear that ironic requires that you 
create an additional host to run nova-compute separately from your 
already-existing compute nodes purely for the purpose of managing the 
ironic-nova integration, which makes sense.  However, the ironic 
documentation at 
https://docs.openstack.org/ironic/latest/install/configure-compute.html 
states that:


"The following configuration file must be modified on the Compute 
service’s controller nodes and compute nodes"


right before it lays out the minimum config requirements for nova <-> 
ironic integration, which suggests additional service requirements 
beyond just an extra nova-compute.  Do we also need to run separate 
instances of nova-scheduler, etc. as well?  I can't seem to find any 
documentation that speaks to this.  The current ironic documentation 
seems to focus on a baremetal-only deployment scenario.


Currently running Pike.

Thanks in advance,

--
v/r

Chris Apsey
bitskr...@bitskrieg.net
https://www.bitskrieg.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] PCI pass through settings on a flavor without aliases on the API nodes

2017-10-18 Thread Van Leeuwen, Robert
Hi,

Does anyone know if it is possible to set PCI pass through on a flavor without 
also needing to set the alias on the nova API nodes as mentioned here:
https://docs.openstack.org/nova/pike/admin/pci-passthrough.html

E.G you need to set in nova.conf:
[pci]
alias = { "vendor_id":"8086", "product_id":"154d ", "device_type":"type-PF", 
"name":"a1" }

Then you can set the flavor:
openstack flavor set m1.large --property "pci_passthrough:alias"="a1:2"


E.g. I would be fine with just setting the PCI vendor/product on the flavor 
instead of also needing to set this at the api node
So something like:
openstack flavor set m1.large –property “pci_passthrough:vendor”=”8086”  
“pci_passthrough:device”=”154d:1”

Thx,
Robert  van Leeuwen
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Guest crash and KVM unhandled rdmsr

2017-10-18 Thread Arne Wiebalck
Blair,

We’ve seen these errors in our deployment as well, on CentOS 7.3 with 3.10 
kernels, when looking into instance
issues. So far we’ve always discarded them as not relevant to the problems 
observed, so I’d be very interested if
it turns out that these should better not be ignored.

Cheers,
 Arne


On 17 Oct 2017, at 18:52, George Mihaiescu 
> wrote:

Hi Blair,

We had a few cases of compute nodes hanging with the last log in syslog being 
related to "rdmsr", and requiring hard reboots:
 kvm [29216]: vcpu0 unhandled rdmsr: 0x345

The workloads are probably similar to yours (SGE workers doing genomics) with 
CPU mode host-passthrough, on top of Ubuntu 16.04 and kernel 4.4.0-96-generic.

I'm not sure the "rdmsr" logs are relevant though, because we see them on other 
 compute nodes that have no issues.

Did you find anything that might indicate what the root cause is?

Cheers,
George


On Thu, Oct 12, 2017 at 5:26 PM, Blair Bethwaite 
> wrote:
Hi all,

Has anyone seen guest crashes/freezes associated with KVM unhandled rdmsr 
messages in dmesg on the hypervisor?

We have seen these messages before but never with a strong correlation to guest 
problems. However over the past couple of weeks this is happening almost daily 
with consistent correlation for a set of hosts dedicated to a particular HPC 
workload. So far as I know the workload has not changed, but we have just 
recently moved the hypervisors to Ubuntu Xenial (though they were already on 
the Xenial kernel previously) and done minor guest (CentOS7) updates. CPU mode 
is host-passthrough. Currently trying to figure out if the CPU flags in the 
guest have changed since the host upgrade...

Cheers,

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

--
Arne Wiebalck
CERN IT

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Openstack-ansible and HAProxy

2017-10-18 Thread Matteo Gerola
Dear all,
Hope this is the right ML for this question, otherwise please let me know.

I have setup an openstack-ansible based infrastructure (pike) with 3 
controllers and 8 computes. Everything works fine except when i configure a 
dedicated IP to be shared between the ctrls.

There are several bridges, but the one involved here are:
br-public (bridge toward internet)
I have one public ip for each ctrl (x.x.x.1, x.x.x.2, x.x.x.3) configured on 
each bridge
br-mgmt (default internal bridge for OS ansible setup)
I have one private ip for each ctrl (y.y.y.1, y.y.y.2, y.y.y.3) configured on 
each bridge
Then, I have configured the openstack-ansible files like:

/etc/openstack_deploy/openstack_user_config.yml
  internal_lb_vip_address: y.y.y.1
I’m using the first ctrl ip for the internal network, but I’m not sure if it’s 
the right config
  external_lb_vip_address: x.x.x.4
I’m using a free IP in the public network to be shared by the controllers
/etc/openstack_deploy/user_variables.yml
haproxy_keepalived_external_vip_cidr: "{{external_lb_vip_address}}/24”
Here do I have to put /32 (default) or /24 (mi CIDR)?
haproxy_keepalived_internal_vip_cidr: "{{internal_lb_vip_address}}/24”
Here do I have to put /32 (default) or /24 (mi CIDR)?
haproxy_keepalived_external_interface: br-public
haproxy_keepalived_internal_interface: br-mgmt

With netstat, I see HAProxy binding all the service ports in the ctrls, but I 
cannot ping (and access horizon or the other services) using the public ip 
x.x.x.4.

Any suggestion?

Thanks a lot,
Matteo

Matteo Gerola, Dott.
Research Engineer

CREATE-NET Research Center
Fondazione Bruno Kessler (FBK)
via alla Cascata 56D
38123 Povo, Trento (Italy)
F: +39 0461 31​2425
e-mail: ​mger...@fbk.eu <--- THIS HAS CHANGED!
www:​​ ​http://create-net.fbk.eu

The information transmitted is intended only for the person or entity to which 
it is addressed and may contain confidential and/or privileged material. Any 
review, retransmission, dissemination or other use of, or taking of any action 
in reliance upon, this information by persons or entities other than the 
intended recipient is prohibited according to the Italian Law 196/2003 of the 
Legislature. If you received this in error, please contact the sender and 
delete the material from any computer.
Le informazioni contenute in questo messaggio di posta elettronica e nei file 
allegati sono da considerarsi strettamente riservate. Il loro utilizzo e' 
consentito esclusivamente al destinatario del messaggio, per le finalità 
indicate nel messaggio stesso. Qualora riceveste questo messaggio senza esserne 
il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di 
procedere alla cancellazione del messaggio stesso dal Vostro sistema. 
Trattenere il messaggio stesso, divulgarlo anche in parte, distribuirlo ad 
altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce 
comportamento contrario ai principi dettati dal D. Lgs. 196/2003.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators