Re: [openstack-dev] [nova][neutron]Fail to communicate new host when the first host for a new instance fails
Hi Rossella, Many thanks for your quick reply! On 14/05/15 11:08, Rossella Sblendido wrote: Hi Neil, what's the status of the port after the migration? You might be hitting [1] . See also the patch that fixes the issue [2] Thanks, but that is definitely not the cause of the problem in my case, because my agent does not call get_device_details. (BTW - it seems obviously wrong to me for an API named get_device_details to change the port status to BUILD, even if the call is coming from the correct host. I would expect that an agent could safely call get_device_details at any time without having any effect on the port state.) If you wait a bit longer, is the host_id updated by Nova? No, it isn't. I've now been able to reproduce this again, and look directly at the Neutron DB, and I think what I see indicates that this is definitely an OpenStack bug (as opposed to a problem in my mechanism driver). My hosts are named calico-vm13 and calico-vm15, and calico-vm13 is set up so that libvirt will fail to launch any instances. When I use the Horizon UI to create an instance, Nova tries calico-vm13 first - which fails - and then calico-vm15, which succeeds. Horizon then shows that the instance is on calico-vm15: admin calico-vm15 dltst cirros-0.3.2-x86_64 10.28.29.214 2001:db8:c41:2::1d9a m1.tiny Active NoneRunning 24 minutes The port for that instance is the cc80291c one here: mysql select * from ports; ++-+--+-+---+++-+--+ | tenant_id | id | name | network_id | mac_address | admin_state_up | status | device_id | device_owner | ++-+--+-+---+++-+--+ | b2d9f70... | 79fd9d6c... | | 1fca4aa4... | fa:16:3e:d3:1a:62 | 1 | DOWN | dhcpea9f... | network:dhcp | | b2d9f70... | cc80291c... | | 1fca4aa4... | fa:16:3e:bc:df:f0 | 1 | ACTIVE | e2b61f5f... | compute:None | | b2d9f70... | d9f7d1d0... | | 1fca4aa4... | fa:16:3e:0b:29:3e | 1 | DOWN | dhcp2ffe... | network:dhcp | And the ml2_port_bindings table shows that Neutron/ML2 thinks that port is still on calico-vm13: mysql select * from ml2_port_bindings; +-+-+--++-+---+---+-+ | port_id | host| vif_type | driver | segment | vnic_type | vif_details | profile | +-+-+--++-+---+---+-+ | 79fd9d6c... | calico-vm13 | tap | calico | fdc5ef44... | normal | {port_filter: true} | | | cc80291c... | calico-vm13 | tap | calico | fdc5ef44... | normal | {port_filter: true} | | | d9f7d1d0... | calico-vm15 | tap | calico | fdc5ef44... | normal | {port_filter: true} | | Where should I start looking, to see where Nova / Neutron _should_ be updating the port binding, in this scenario? Many thanks, Neil cheers, Rossella [1] https://bugs.launchpad.net/neutron/+bug/1439857 [2] https://review.openstack.org/#/c/163178/ On 05/14/2015 11:29 AM, Neil Jerram wrote: Hi all, this is about a problem I'm seeing with my Neutron ML2 mechanism driver [1]. I'm expecting to see an update_port_postcommit call to signal that the binding:host_id for a port is changing, but I don't see that. The scenario is launching a new instance in a cluster with two compute hosts, where we've rigged things so that one of the compute hosts will always be chosen first, but libvirt isn't correctly configured there and hence the instance launching attempt will fail. Then Nova tries to use the other compute host instead, and that mostly works - except that my mechanism driver still thinks that the new instance's port is still bound to the first compute host. Is anyone aware of a known problem in this area (in Juno-level code), or where I could like to start pinning this down in more detail? Many thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
Re: [openstack-dev] [nova][neutron]Fail to communicate new host when the first host for a new instance fails
On 14/05/15 16:04, Brian Haley wrote: On 05/14/2015 05:29 AM, Neil Jerram wrote: Hi all, this is about a problem I'm seeing with my Neutron ML2 mechanism driver [1]. I'm expecting to see an update_port_postcommit call to signal that the binding:host_id for a port is changing, but I don't see that. The scenario is launching a new instance in a cluster with two compute hosts, where we've rigged things so that one of the compute hosts will always be chosen first, but libvirt isn't correctly configured there and hence the instance launching attempt will fail. Then Nova tries to use the other compute host instead, and that mostly works - except that my mechanism driver still thinks that the new instance's port is still bound to the first compute host. Is anyone aware of a known problem in this area (in Juno-level code), or where I could like to start pinning this down in more detail? We saw something like this before, perhaps: https://review.openstack.org/#/c/98340/ https://bugs.launchpad.net/nova/+bug/1327124 It's fixed in Kilo only if that's it. Thanks so much, Brian. This does indeed look like the right fix for the problem that I'm seeing. I'll report back once we've tested further here. Regards, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron]Fail to communicate new host when the first host for a new instance fails
On 05/14/2015 05:29 AM, Neil Jerram wrote: Hi all, this is about a problem I'm seeing with my Neutron ML2 mechanism driver [1]. I'm expecting to see an update_port_postcommit call to signal that the binding:host_id for a port is changing, but I don't see that. The scenario is launching a new instance in a cluster with two compute hosts, where we've rigged things so that one of the compute hosts will always be chosen first, but libvirt isn't correctly configured there and hence the instance launching attempt will fail. Then Nova tries to use the other compute host instead, and that mostly works - except that my mechanism driver still thinks that the new instance's port is still bound to the first compute host. Is anyone aware of a known problem in this area (in Juno-level code), or where I could like to start pinning this down in more detail? We saw something like this before, perhaps: https://review.openstack.org/#/c/98340/ https://bugs.launchpad.net/nova/+bug/1327124 It's fixed in Kilo only if that's it. -Brian __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova][neutron]Fail to communicate new host when the first host for a new instance fails
Hi all, this is about a problem I'm seeing with my Neutron ML2 mechanism driver [1]. I'm expecting to see an update_port_postcommit call to signal that the binding:host_id for a port is changing, but I don't see that. The scenario is launching a new instance in a cluster with two compute hosts, where we've rigged things so that one of the compute hosts will always be chosen first, but libvirt isn't correctly configured there and hence the instance launching attempt will fail. Then Nova tries to use the other compute host instead, and that mostly works - except that my mechanism driver still thinks that the new instance's port is still bound to the first compute host. Is anyone aware of a known problem in this area (in Juno-level code), or where I could like to start pinning this down in more detail? Many thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron]Fail to communicate new host when the first host for a new instance fails
Hi Neil, what's the status of the port after the migration? You might be hitting [1] . See also the patch that fixes the issue [2] If you wait a bit longer, is the host_id updated by Nova? cheers, Rossella [1] https://bugs.launchpad.net/neutron/+bug/1439857 [2] https://review.openstack.org/#/c/163178/ On 05/14/2015 11:29 AM, Neil Jerram wrote: Hi all, this is about a problem I'm seeing with my Neutron ML2 mechanism driver [1]. I'm expecting to see an update_port_postcommit call to signal that the binding:host_id for a port is changing, but I don't see that. The scenario is launching a new instance in a cluster with two compute hosts, where we've rigged things so that one of the compute hosts will always be chosen first, but libvirt isn't correctly configured there and hence the instance launching attempt will fail. Then Nova tries to use the other compute host instead, and that mostly works - except that my mechanism driver still thinks that the new instance's port is still bound to the first compute host. Is anyone aware of a known problem in this area (in Juno-level code), or where I could like to start pinning this down in more detail? Many thanks, Neil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev