Re: [openstack-dev] [nova][neutron]Fail to communicate new host when the first host for a new instance fails

2015-05-14 Thread Neil Jerram

Hi Rossella,

Many thanks for your quick reply!

On 14/05/15 11:08, Rossella Sblendido wrote:

Hi Neil,

what's the status of the port after the migration? You might be hitting
[1] . See also the patch that fixes the issue [2]


Thanks, but that is definitely not the cause of the problem in my case, 
because my agent does not call get_device_details.


(BTW - it seems obviously wrong to me for an API named 
get_device_details to change the port status to BUILD, even if the call 
is coming from the correct host.  I would expect that an agent could 
safely call get_device_details at any time without having any effect on 
the port state.)



If you wait a bit longer, is the host_id updated by Nova?


No, it isn't.

I've now been able to reproduce this again, and look directly at the 
Neutron DB, and I think what I see indicates that this is definitely an 
OpenStack bug (as opposed to a problem in my mechanism driver).


My hosts are named calico-vm13 and calico-vm15, and calico-vm13 is set 
up so that libvirt will fail to launch any instances.  When I use the 
Horizon UI to create an instance, Nova tries calico-vm13 first - which 
fails - and then calico-vm15, which succeeds.


Horizon then shows that the instance is on calico-vm15:

admin   calico-vm15 
dltst
cirros-0.3.2-x86_64 

10.28.29.214
2001:db8:c41:2::1d9a

m1.tiny Active  NoneRunning 24 minutes  

The port for that instance is the cc80291c one here:

mysql select * from ports;
++-+--+-+---+++-+--+
| tenant_id  | id  | name | network_id  | mac_address   | 
admin_state_up | status | device_id   | device_owner |

++-+--+-+---+++-+--+
| b2d9f70... | 79fd9d6c... |  | 1fca4aa4... | fa:16:3e:d3:1a:62 | 
   1 | DOWN   | dhcpea9f... | network:dhcp |
| b2d9f70... | cc80291c... |  | 1fca4aa4... | fa:16:3e:bc:df:f0 | 
   1 | ACTIVE | e2b61f5f... | compute:None |
| b2d9f70... | d9f7d1d0... |  | 1fca4aa4... | fa:16:3e:0b:29:3e | 
   1 | DOWN   | dhcp2ffe... | network:dhcp |


And the ml2_port_bindings table shows that Neutron/ML2 thinks that port 
is still on calico-vm13:


mysql select * from ml2_port_bindings;
+-+-+--++-+---+---+-+
| port_id | host| vif_type | driver | segment | 
vnic_type | vif_details   | profile |

+-+-+--++-+---+---+-+
| 79fd9d6c... | calico-vm13 | tap  | calico | fdc5ef44... | normal 
  | {port_filter: true} | |
| cc80291c... | calico-vm13 | tap  | calico | fdc5ef44... | normal 
  | {port_filter: true} | |
| d9f7d1d0... | calico-vm15 | tap  | calico | fdc5ef44... | normal 
  | {port_filter: true} | |



Where should I start looking, to see where Nova / Neutron _should_ be 
updating the port binding, in this scenario?


Many thanks,
Neil



cheers,

Rossella

[1] https://bugs.launchpad.net/neutron/+bug/1439857
[2] https://review.openstack.org/#/c/163178/

On 05/14/2015 11:29 AM, Neil Jerram wrote:

Hi all, this is about a problem I'm seeing with my Neutron ML2 mechanism
driver [1].  I'm expecting to see an update_port_postcommit call to
signal that the binding:host_id for a port is changing, but I don't see
that.

The scenario is launching a new instance in a cluster with two compute
hosts, where we've rigged things so that one of the compute hosts will
always be chosen first, but libvirt isn't correctly configured there and
hence the instance launching attempt will fail.  Then Nova tries to use
the other compute host instead, and that mostly works - except that my
mechanism driver still thinks that the new instance's port is still
bound to the first compute host.

Is anyone aware of a known problem in this area (in Juno-level code), or
where I could like to start pinning this down in more detail?

Many thanks,
 Neil

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [nova][neutron]Fail to communicate new host when the first host for a new instance fails

2015-05-14 Thread Neil Jerram

On 14/05/15 16:04, Brian Haley wrote:

On 05/14/2015 05:29 AM, Neil Jerram wrote:

Hi all, this is about a problem I'm seeing with my Neutron ML2
mechanism driver
[1].  I'm expecting to see an update_port_postcommit call to signal
that the
binding:host_id for a port is changing, but I don't see that.

The scenario is launching a new instance in a cluster with two compute
hosts,
where we've rigged things so that one of the compute hosts will always
be chosen
first, but libvirt isn't correctly configured there and hence the
instance
launching attempt will fail.  Then Nova tries to use the other compute
host
instead, and that mostly works - except that my mechanism driver still
thinks
that the new instance's port is still bound to the first compute host.

Is anyone aware of a known problem in this area (in Juno-level code),
or where I
could like to start pinning this down in more detail?


We saw something like this before, perhaps:

https://review.openstack.org/#/c/98340/
https://bugs.launchpad.net/nova/+bug/1327124

It's fixed in Kilo only if that's it.


Thanks so much, Brian.  This does indeed look like the right fix for the 
problem that I'm seeing.  I'll report back once we've tested further here.


Regards,
Neil


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron]Fail to communicate new host when the first host for a new instance fails

2015-05-14 Thread Brian Haley

On 05/14/2015 05:29 AM, Neil Jerram wrote:

Hi all, this is about a problem I'm seeing with my Neutron ML2 mechanism driver
[1].  I'm expecting to see an update_port_postcommit call to signal that the
binding:host_id for a port is changing, but I don't see that.

The scenario is launching a new instance in a cluster with two compute hosts,
where we've rigged things so that one of the compute hosts will always be chosen
first, but libvirt isn't correctly configured there and hence the instance
launching attempt will fail.  Then Nova tries to use the other compute host
instead, and that mostly works - except that my mechanism driver still thinks
that the new instance's port is still bound to the first compute host.

Is anyone aware of a known problem in this area (in Juno-level code), or where I
could like to start pinning this down in more detail?


We saw something like this before, perhaps:

https://review.openstack.org/#/c/98340/
https://bugs.launchpad.net/nova/+bug/1327124

It's fixed in Kilo only if that's it.

-Brian

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][neutron]Fail to communicate new host when the first host for a new instance fails

2015-05-14 Thread Neil Jerram
Hi all, this is about a problem I'm seeing with my Neutron ML2 mechanism 
driver [1].  I'm expecting to see an update_port_postcommit call to 
signal that the binding:host_id for a port is changing, but I don't see 
that.


The scenario is launching a new instance in a cluster with two compute 
hosts, where we've rigged things so that one of the compute hosts will 
always be chosen first, but libvirt isn't correctly configured there and 
hence the instance launching attempt will fail.  Then Nova tries to use 
the other compute host instead, and that mostly works - except that my 
mechanism driver still thinks that the new instance's port is still 
bound to the first compute host.


Is anyone aware of a known problem in this area (in Juno-level code), or 
where I could like to start pinning this down in more detail?


Many thanks,
Neil

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron]Fail to communicate new host when the first host for a new instance fails

2015-05-14 Thread Rossella Sblendido
Hi Neil,

what's the status of the port after the migration? You might be hitting
[1] . See also the patch that fixes the issue [2]

If you wait a bit longer, is the host_id updated by Nova?

cheers,

Rossella

[1] https://bugs.launchpad.net/neutron/+bug/1439857
[2] https://review.openstack.org/#/c/163178/

On 05/14/2015 11:29 AM, Neil Jerram wrote:
 Hi all, this is about a problem I'm seeing with my Neutron ML2 mechanism
 driver [1].  I'm expecting to see an update_port_postcommit call to
 signal that the binding:host_id for a port is changing, but I don't see
 that.
 
 The scenario is launching a new instance in a cluster with two compute
 hosts, where we've rigged things so that one of the compute hosts will
 always be chosen first, but libvirt isn't correctly configured there and
 hence the instance launching attempt will fail.  Then Nova tries to use
 the other compute host instead, and that mostly works - except that my
 mechanism driver still thinks that the new instance's port is still
 bound to the first compute host.
 
 Is anyone aware of a known problem in this area (in Juno-level code), or
 where I could like to start pinning this down in more detail?
 
 Many thanks,
 Neil
 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev