Re: [openstack-dev] [openstack-ansible] L3HA problem

2016-06-24 Thread fabrice grelaud

> Le 22 juin 2016 à 19:40, Assaf Muller  a écrit :
> 
> On Wed, Jun 22, 2016 at 12:02 PM, fabrice grelaud
> mailto:fabrice.grel...@u-bordeaux.fr>> wrote:
>> 
>> Le 22 juin 2016 à 17:35, fabrice grelaud  a
>> écrit :
>> 
>> 
>> Le 22 juin 2016 à 15:45, Assaf Muller  a écrit :
>> 
>> On Wed, Jun 22, 2016 at 9:24 AM, fabrice grelaud
>>  wrote:
>> 
>> Hi,
>> 
>> we deployed our openstack infrastructure with your « exciting » project
>> openstack-ansible (mitaka 13.1.2) but we have some problems with L3HA after
>> create router.
>> 
>> Our infra (closer to the doc):
>> 3 controllers nodes (with bond0 (br-mgmt, br-storage), bond1 (br-vxlan,
>> br-vlan))
>> 2 compute nodes (same for network)
>> 
>> We create an external network (vlan type), an internal network (vxlan type)
>> and a router connected to both networks.
>> And when we launch an instance (cirros), we can’t receive an ip on the vm.
>> 
>> We have:
>> 
>> root@p-osinfra03-utility-container-783041da:~# neutron
>> l3-agent-list-hosting-router router-bim
>> +--+---++---+--+
>> | id   | host
>> | admin_state_up | alive | ha_state |
>> +--+---++---+--+
>> | 3c7918e5-3ad6-4f82-a81b-700790e3c016 |
>> p-osinfra01-neutron-agents-container-f1ab9c14 | True | :-)   |
>> active   |
>> | f2bf385a-f210-4dbc-8d7d-4b7b845c09b0 |
>> p-osinfra02-neutron-agents-container-48142ffe | True  | :-)   |
>> active   |
>> | 55350fac-16aa-488e-91fd-a7db38179c62 |
>> p-osinfra03-neutron-agents-container-2f6557f0 | True  | :-)   |
>> active   |
>> +--+---++---+—+
>> 
>> I know, i got a problem now because i should have :-) active, :-) standby,
>> :-) standby… Snif...
>> 
>> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns
>> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6
>> qdhcp-0ba266fb-15c4-4566-ae88-92d4c8fd2036
>> 
>> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
>> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ip a sh
>> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
>> default
>>   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>   inet 127.0.0.1/8 scope host lo
>>  valid_lft forever preferred_lft forever
>>   inet6 ::1/128 scope host
>>  valid_lft forever preferred_lft forever
>> 2: ha-4a5f0287-91@if6:  mtu 1450 qdisc
>> pfifo_fast state UP group default qlen 1000
>>   link/ether fa:16:3e:c2:67:a9 brd ff:ff:ff:ff:ff:ff
>>   inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-4a5f0287-91
>>  valid_lft forever preferred_lft forever
>>   inet 169.254.0.1/24 scope global ha-4a5f0287-91
>>  valid_lft forever preferred_lft forever
>>   inet6 fe80::f816:3eff:fec2:67a9/64 scope link
>>  valid_lft forever preferred_lft forever
>> 3: qr-44804d69-88@if9:  mtu 1450 qdisc
>> pfifo_fast state UP group default qlen 1000
>>   link/ether fa:16:3e:a5:8c:f2 brd ff:ff:ff:ff:ff:ff
>>   inet 192.168.100.254/24 scope global qr-44804d69-88
>>  valid_lft forever preferred_lft forever
>>   inet6 fe80::f816:3eff:fea5:8cf2/64 scope link
>>  valid_lft forever preferred_lft forever
>> 4: qg-c5c7378e-1d@if12:  mtu 1500 qdisc
>> pfifo_fast state UP group default qlen 1000
>>   link/ether fa:16:3e:b6:4c:97 brd ff:ff:ff:ff:ff:ff
>>   inet 147.210.240.11/23 scope global qg-c5c7378e-1d
>>  valid_lft forever preferred_lft forever
>>   inet 147.210.240.12/32 scope global qg-c5c7378e-1d
>>  valid_lft forever preferred_lft forever
>>   inet6 fe80::f816:3eff:feb6:4c97/64 scope link
>>  valid_lft forever preferred_lft forever
>> 
>> Same result on infra02 and infra03, qr and qg interfaces have the same ip,
>> and ha interfaces the address 169.254.0.1.
>> 
>> If we stop 2 neutron agent containers (p-osinfra02, p-osinfra03) and we
>> restart the first (p-osinfra01), we can reboot the instance and we got an
>> ip, a floating ip and we can access by ssh from internet to the vm. (Note:
>> after few time, we loss our connectivity too).
>> 
>> But if we restart the two containers, we got a ha_state to « standby » until
>> the three become « active » and finally we have the problem again.
>> 
>> The three routers on infra 01/02/03 are seen as master.
>> 
>> If we ping from our instance to the router (internal network 192.168.100.4
>> to 192.168.100.254) we can see some ARP Request
>> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>> 
>> And on the compute node we see all these frames on the various interfaces
>> tap / vxlan-89 / br-vxlan / bond1.vxlanvlan / bond1 / em2 but nothing back.
>> 
>> We also have o

Re: [openstack-dev] [openstack-ansible] L3HA problem

2016-06-23 Thread Anna Kamyshnikova
Version 1.2.13 is reliable.

On Wed, Jun 22, 2016 at 8:40 PM, Assaf Muller  wrote:

> On Wed, Jun 22, 2016 at 12:02 PM, fabrice grelaud
>  wrote:
> >
> > Le 22 juin 2016 à 17:35, fabrice grelaud 
> a
> > écrit :
> >
> >
> > Le 22 juin 2016 à 15:45, Assaf Muller  a écrit :
> >
> > On Wed, Jun 22, 2016 at 9:24 AM, fabrice grelaud
> >  wrote:
> >
> > Hi,
> >
> > we deployed our openstack infrastructure with your « exciting » project
> > openstack-ansible (mitaka 13.1.2) but we have some problems with L3HA
> after
> > create router.
> >
> > Our infra (closer to the doc):
> > 3 controllers nodes (with bond0 (br-mgmt, br-storage), bond1 (br-vxlan,
> > br-vlan))
> > 2 compute nodes (same for network)
> >
> > We create an external network (vlan type), an internal network (vxlan
> type)
> > and a router connected to both networks.
> > And when we launch an instance (cirros), we can’t receive an ip on the
> vm.
> >
> > We have:
> >
> > root@p-osinfra03-utility-container-783041da:~# neutron
> > l3-agent-list-hosting-router router-bim
> >
> +--+---++---+--+
> > | id   | host
> > | admin_state_up | alive | ha_state |
> >
> +--+---++---+--+
> > | 3c7918e5-3ad6-4f82-a81b-700790e3c016 |
> > p-osinfra01-neutron-agents-container-f1ab9c14 | True   | :-)   |
> > active   |
> > | f2bf385a-f210-4dbc-8d7d-4b7b845c09b0 |
> > p-osinfra02-neutron-agents-container-48142ffe | True   | :-)   |
> > active   |
> > | 55350fac-16aa-488e-91fd-a7db38179c62 |
> > p-osinfra03-neutron-agents-container-2f6557f0 | True   | :-)   |
> > active   |
> >
> +--+---++---+—+
> >
> > I know, i got a problem now because i should have :-) active, :-)
> standby,
> > :-) standby… Snif...
> >
> > root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns
> > qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6
> > qdhcp-0ba266fb-15c4-4566-ae88-92d4c8fd2036
> >
> > root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
> > qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ip a sh
> > 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
> > default
> >link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> >inet 127.0.0.1/8 scope host lo
> >   valid_lft forever preferred_lft forever
> >inet6 ::1/128 scope host
> >   valid_lft forever preferred_lft forever
> > 2: ha-4a5f0287-91@if6:  mtu 1450 qdisc
> > pfifo_fast state UP group default qlen 1000
> >link/ether fa:16:3e:c2:67:a9 brd ff:ff:ff:ff:ff:ff
> >inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-4a5f0287-91
> >   valid_lft forever preferred_lft forever
> >inet 169.254.0.1/24 scope global ha-4a5f0287-91
> >   valid_lft forever preferred_lft forever
> >inet6 fe80::f816:3eff:fec2:67a9/64 scope link
> >   valid_lft forever preferred_lft forever
> > 3: qr-44804d69-88@if9:  mtu 1450 qdisc
> > pfifo_fast state UP group default qlen 1000
> >link/ether fa:16:3e:a5:8c:f2 brd ff:ff:ff:ff:ff:ff
> >inet 192.168.100.254/24 scope global qr-44804d69-88
> >   valid_lft forever preferred_lft forever
> >inet6 fe80::f816:3eff:fea5:8cf2/64 scope link
> >   valid_lft forever preferred_lft forever
> > 4: qg-c5c7378e-1d@if12:  mtu 1500 qdisc
> > pfifo_fast state UP group default qlen 1000
> >link/ether fa:16:3e:b6:4c:97 brd ff:ff:ff:ff:ff:ff
> >inet 147.210.240.11/23 scope global qg-c5c7378e-1d
> >   valid_lft forever preferred_lft forever
> >inet 147.210.240.12/32 scope global qg-c5c7378e-1d
> >   valid_lft forever preferred_lft forever
> >inet6 fe80::f816:3eff:feb6:4c97/64 scope link
> >   valid_lft forever preferred_lft forever
> >
> > Same result on infra02 and infra03, qr and qg interfaces have the same
> ip,
> > and ha interfaces the address 169.254.0.1.
> >
> > If we stop 2 neutron agent containers (p-osinfra02, p-osinfra03) and we
> > restart the first (p-osinfra01), we can reboot the instance and we got an
> > ip, a floating ip and we can access by ssh from internet to the vm.
> (Note:
> > after few time, we loss our connectivity too).
> >
> > But if we restart the two containers, we got a ha_state to « standby »
> until
> > the three become « active » and finally we have the problem again.
> >
> > The three routers on infra 01/02/03 are seen as master.
> >
> > If we ping from our instance to the router (internal network
> 192.168.100.4
> > to 192.168.100.254) we can see some ARP Request
> > ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> > ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> > ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> >
> > And on the compute node we see all these fra

Re: [openstack-dev] [openstack-ansible] L3HA problem

2016-06-22 Thread Assaf Muller
On Wed, Jun 22, 2016 at 12:02 PM, fabrice grelaud
 wrote:
>
> Le 22 juin 2016 à 17:35, fabrice grelaud  a
> écrit :
>
>
> Le 22 juin 2016 à 15:45, Assaf Muller  a écrit :
>
> On Wed, Jun 22, 2016 at 9:24 AM, fabrice grelaud
>  wrote:
>
> Hi,
>
> we deployed our openstack infrastructure with your « exciting » project
> openstack-ansible (mitaka 13.1.2) but we have some problems with L3HA after
> create router.
>
> Our infra (closer to the doc):
> 3 controllers nodes (with bond0 (br-mgmt, br-storage), bond1 (br-vxlan,
> br-vlan))
> 2 compute nodes (same for network)
>
> We create an external network (vlan type), an internal network (vxlan type)
> and a router connected to both networks.
> And when we launch an instance (cirros), we can’t receive an ip on the vm.
>
> We have:
>
> root@p-osinfra03-utility-container-783041da:~# neutron
> l3-agent-list-hosting-router router-bim
> +--+---++---+--+
> | id   | host
> | admin_state_up | alive | ha_state |
> +--+---++---+--+
> | 3c7918e5-3ad6-4f82-a81b-700790e3c016 |
> p-osinfra01-neutron-agents-container-f1ab9c14 | True   | :-)   |
> active   |
> | f2bf385a-f210-4dbc-8d7d-4b7b845c09b0 |
> p-osinfra02-neutron-agents-container-48142ffe | True   | :-)   |
> active   |
> | 55350fac-16aa-488e-91fd-a7db38179c62 |
> p-osinfra03-neutron-agents-container-2f6557f0 | True   | :-)   |
> active   |
> +--+---++---+—+
>
> I know, i got a problem now because i should have :-) active, :-) standby,
> :-) standby… Snif...
>
> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns
> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6
> qdhcp-0ba266fb-15c4-4566-ae88-92d4c8fd2036
>
> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ip a sh
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
> default
>link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>inet 127.0.0.1/8 scope host lo
>   valid_lft forever preferred_lft forever
>inet6 ::1/128 scope host
>   valid_lft forever preferred_lft forever
> 2: ha-4a5f0287-91@if6:  mtu 1450 qdisc
> pfifo_fast state UP group default qlen 1000
>link/ether fa:16:3e:c2:67:a9 brd ff:ff:ff:ff:ff:ff
>inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-4a5f0287-91
>   valid_lft forever preferred_lft forever
>inet 169.254.0.1/24 scope global ha-4a5f0287-91
>   valid_lft forever preferred_lft forever
>inet6 fe80::f816:3eff:fec2:67a9/64 scope link
>   valid_lft forever preferred_lft forever
> 3: qr-44804d69-88@if9:  mtu 1450 qdisc
> pfifo_fast state UP group default qlen 1000
>link/ether fa:16:3e:a5:8c:f2 brd ff:ff:ff:ff:ff:ff
>inet 192.168.100.254/24 scope global qr-44804d69-88
>   valid_lft forever preferred_lft forever
>inet6 fe80::f816:3eff:fea5:8cf2/64 scope link
>   valid_lft forever preferred_lft forever
> 4: qg-c5c7378e-1d@if12:  mtu 1500 qdisc
> pfifo_fast state UP group default qlen 1000
>link/ether fa:16:3e:b6:4c:97 brd ff:ff:ff:ff:ff:ff
>inet 147.210.240.11/23 scope global qg-c5c7378e-1d
>   valid_lft forever preferred_lft forever
>inet 147.210.240.12/32 scope global qg-c5c7378e-1d
>   valid_lft forever preferred_lft forever
>inet6 fe80::f816:3eff:feb6:4c97/64 scope link
>   valid_lft forever preferred_lft forever
>
> Same result on infra02 and infra03, qr and qg interfaces have the same ip,
> and ha interfaces the address 169.254.0.1.
>
> If we stop 2 neutron agent containers (p-osinfra02, p-osinfra03) and we
> restart the first (p-osinfra01), we can reboot the instance and we got an
> ip, a floating ip and we can access by ssh from internet to the vm. (Note:
> after few time, we loss our connectivity too).
>
> But if we restart the two containers, we got a ha_state to « standby » until
> the three become « active » and finally we have the problem again.
>
> The three routers on infra 01/02/03 are seen as master.
>
> If we ping from our instance to the router (internal network 192.168.100.4
> to 192.168.100.254) we can see some ARP Request
> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>
> And on the compute node we see all these frames on the various interfaces
> tap / vxlan-89 / br-vxlan / bond1.vxlanvlan / bond1 / em2 but nothing back.
>
> We also have on ha interface, on each router, the VRRP communication
> (heartbeat packets over a hidden project network that connects all ha
> routers (vxlan 70) ) . Priori as normal, each router thinks to be mas

Re: [openstack-dev] [openstack-ansible] L3HA problem

2016-06-22 Thread fabrice grelaud

> Le 22 juin 2016 à 17:35, fabrice grelaud  a 
> écrit :
> 
>> 
>> Le 22 juin 2016 à 15:45, Assaf Muller > > a écrit :
>> 
>> On Wed, Jun 22, 2016 at 9:24 AM, fabrice grelaud
>> mailto:fabrice.grel...@u-bordeaux.fr>> wrote:
>>> Hi,
>>> 
>>> we deployed our openstack infrastructure with your « exciting » project
>>> openstack-ansible (mitaka 13.1.2) but we have some problems with L3HA after
>>> create router.
>>> 
>>> Our infra (closer to the doc):
>>> 3 controllers nodes (with bond0 (br-mgmt, br-storage), bond1 (br-vxlan,
>>> br-vlan))
>>> 2 compute nodes (same for network)
>>> 
>>> We create an external network (vlan type), an internal network (vxlan type)
>>> and a router connected to both networks.
>>> And when we launch an instance (cirros), we can’t receive an ip on the vm.
>>> 
>>> We have:
>>> 
>>> root@p-osinfra03-utility-container-783041da:~# neutron
>>> l3-agent-list-hosting-router router-bim
>>> +--+---++---+--+
>>> | id   | host
>>> | admin_state_up | alive | ha_state |
>>> +--+---++---+--+
>>> | 3c7918e5-3ad6-4f82-a81b-700790e3c016 |
>>> p-osinfra01-neutron-agents-container-f1ab9c14 | True   | :-)   |
>>> active   |
>>> | f2bf385a-f210-4dbc-8d7d-4b7b845c09b0 |
>>> p-osinfra02-neutron-agents-container-48142ffe | True   | :-)   |
>>> active   |
>>> | 55350fac-16aa-488e-91fd-a7db38179c62 |
>>> p-osinfra03-neutron-agents-container-2f6557f0 | True   | :-)   |
>>> active   |
>>> +--+---++---+—+
>>> 
>>> I know, i got a problem now because i should have :-) active, :-) standby,
>>> :-) standby… Snif...
>>> 
>>> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns
>>> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6
>>> qdhcp-0ba266fb-15c4-4566-ae88-92d4c8fd2036
>>> 
>>> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
>>> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ip a sh
>>> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
>>> default
>>>link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>>inet 127.0.0.1/8 scope host lo
>>>   valid_lft forever preferred_lft forever
>>>inet6 ::1/128 scope host
>>>   valid_lft forever preferred_lft forever
>>> 2: ha-4a5f0287-91@if6:  mtu 1450 qdisc
>>> pfifo_fast state UP group default qlen 1000
>>>link/ether fa:16:3e:c2:67:a9 brd ff:ff:ff:ff:ff:ff
>>>inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-4a5f0287-91
>>>   valid_lft forever preferred_lft forever
>>>inet 169.254.0.1/24 scope global ha-4a5f0287-91
>>>   valid_lft forever preferred_lft forever
>>>inet6 fe80::f816:3eff:fec2:67a9/64 scope link
>>>   valid_lft forever preferred_lft forever
>>> 3: qr-44804d69-88@if9:  mtu 1450 qdisc
>>> pfifo_fast state UP group default qlen 1000
>>>link/ether fa:16:3e:a5:8c:f2 brd ff:ff:ff:ff:ff:ff
>>>inet 192.168.100.254/24 scope global qr-44804d69-88
>>>   valid_lft forever preferred_lft forever
>>>inet6 fe80::f816:3eff:fea5:8cf2/64 scope link
>>>   valid_lft forever preferred_lft forever
>>> 4: qg-c5c7378e-1d@if12:  mtu 1500 qdisc
>>> pfifo_fast state UP group default qlen 1000
>>>link/ether fa:16:3e:b6:4c:97 brd ff:ff:ff:ff:ff:ff
>>>inet 147.210.240.11/23 scope global qg-c5c7378e-1d
>>>   valid_lft forever preferred_lft forever
>>>inet 147.210.240.12/32 scope global qg-c5c7378e-1d
>>>   valid_lft forever preferred_lft forever
>>>inet6 fe80::f816:3eff:feb6:4c97/64 scope link
>>>   valid_lft forever preferred_lft forever
>>> 
>>> Same result on infra02 and infra03, qr and qg interfaces have the same ip,
>>> and ha interfaces the address 169.254.0.1.
>>> 
>>> If we stop 2 neutron agent containers (p-osinfra02, p-osinfra03) and we
>>> restart the first (p-osinfra01), we can reboot the instance and we got an
>>> ip, a floating ip and we can access by ssh from internet to the vm. (Note:
>>> after few time, we loss our connectivity too).
>>> 
>>> But if we restart the two containers, we got a ha_state to « standby » until
>>> the three become « active » and finally we have the problem again.
>>> 
>>> The three routers on infra 01/02/03 are seen as master.
>>> 
>>> If we ping from our instance to the router (internal network 192.168.100.4
>>> to 192.168.100.254) we can see some ARP Request
>>> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>>> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>>> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>>> 
>>> And on the compute node we see all these frames on the various interfaces
>>> tap / vxlan-89 / br-vxlan / bond1.vxlanvlan / bond1 / em2 but nothing 

Re: [openstack-dev] [openstack-ansible] L3HA problem

2016-06-22 Thread fabrice grelaud
Thanks. I will test…

Do you think trusty-backport is enough (1:1.2.13-1~ubuntu14.04.1) ?


> Le 22 juin 2016 à 16:21, Anna Kamyshnikova  a 
> écrit :
> 
> Keepalived 1.2.7 is bad version. Please, see comments in this bug  
> https://bugs.launchpad.net/neutron/+bug/1497272 
> . I suggest you to try one 
> of the latest version of Keepalived.
> 
> On Wed, Jun 22, 2016 at 5:03 PM, fabrice grelaud 
> mailto:fabrice.grel...@u-bordeaux.fr>> wrote:
> Hi,
> 
> keepalived 1:1.2.7-1ubuntu
> 
> 
>> Le 22 juin 2016 à 15:41, Anna Kamyshnikova > > a écrit :
>> 
>> Hi!
>> 
>> What Keepalived version is used?
>> 
>> On Wed, Jun 22, 2016 at 4:24 PM, fabrice grelaud 
>> mailto:fabrice.grel...@u-bordeaux.fr>> wrote:
>> Hi,
>> 
>> we deployed our openstack infrastructure with your « exciting » project 
>> openstack-ansible (mitaka 13.1.2) but we have some problems with L3HA after 
>> create router.
>> 
>> Our infra (closer to the doc):
>> 3 controllers nodes (with bond0 (br-mgmt, br-storage), bond1 (br-vxlan, 
>> br-vlan))
>> 2 compute nodes (same for network)
>> 
>> We create an external network (vlan type), an internal network (vxlan type) 
>> and a router connected to both networks.
>> And when we launch an instance (cirros), we can’t receive an ip on the vm.
>> 
>> We have:
>> 
>> root@p-osinfra03-utility-container-783041da:~# neutron 
>> l3-agent-list-hosting-router router-bim
>> +--+---++---+--+
>> | id   | host
>>   | admin_state_up | alive | ha_state |
>> +--+---++---+--+
>> | 3c7918e5-3ad6-4f82-a81b-700790e3c016 | 
>> p-osinfra01-neutron-agents-container-f1ab9c14 | True   | :-)   | 
>> active   |
>> | f2bf385a-f210-4dbc-8d7d-4b7b845c09b0 | 
>> p-osinfra02-neutron-agents-container-48142ffe | True   | :-)   | 
>> active   |
>> | 55350fac-16aa-488e-91fd-a7db38179c62 | 
>> p-osinfra03-neutron-agents-container-2f6557f0 | True   | :-)   | 
>> active   |
>> +--+---++---+—+
>> 
>> I know, i got a problem now because i should have :-) active, :-) standby, 
>> :-) standby… Snif...
>> 
>> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns
>> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6
>> qdhcp-0ba266fb-15c4-4566-ae88-92d4c8fd2036
>> 
>> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec 
>> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ip a sh
>> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
>> default 
>> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>> inet 127.0.0.1/8  scope host lo
>>valid_lft forever preferred_lft forever
>> inet6 ::1/128 scope host 
>>valid_lft forever preferred_lft forever
>> 2: ha-4a5f0287-91@if6:  mtu 1450 qdisc 
>> pfifo_fast state UP group default qlen 1000
>> link/ether fa:16:3e:c2:67:a9 brd ff:ff:ff:ff:ff:ff
>> inet 169.254.192.1/18  brd 169.254.255.255 
>> scope global ha-4a5f0287-91
>>valid_lft forever preferred_lft forever
>> inet 169.254.0.1/24  scope global ha-4a5f0287-91
>>valid_lft forever preferred_lft forever
>> inet6 fe80::f816:3eff:fec2:67a9/64 scope link 
>>valid_lft forever preferred_lft forever
>> 3: qr-44804d69-88@if9:  mtu 1450 qdisc 
>> pfifo_fast state UP group default qlen 1000
>> link/ether fa:16:3e:a5:8c:f2 brd ff:ff:ff:ff:ff:ff
>> inet 192.168.100.254/24  scope global 
>> qr-44804d69-88
>>valid_lft forever preferred_lft forever
>> inet6 fe80::f816:3eff:fea5:8cf2/64 scope link 
>>valid_lft forever preferred_lft forever
>> 4: qg-c5c7378e-1d@if12:  mtu 1500 qdisc 
>> pfifo_fast state UP group default qlen 1000
>> link/ether fa:16:3e:b6:4c:97 brd ff:ff:ff:ff:ff:ff
>> inet 147.210.240.11/23  scope global 
>> qg-c5c7378e-1d
>>valid_lft forever preferred_lft forever
>> inet 147.210.240.12/32  scope global 
>> qg-c5c7378e-1d
>>valid_lft forever preferred_lft forever
>> inet6 fe80::f816:3eff:feb6:4c97/64 scope link 
>>valid_lft forever preferred_lft forever
>> 
>> Same result on infra02 and infra03, qr and qg interfaces have the same ip, 
>> and ha interfaces the address 169.254.0.1.
>> 
>> If we stop 2 neutron agent containers (p-osinfra02, p-osinfra03) and we 
>> restart the first (p-osinfra01), we can reboot the instance and we got an 
>> ip, a floating ip and we can access by ssh from internet to the vm. (Note: 
>> after few time, we loss our connectivity too).
>> 
>> B

Re: [openstack-dev] [openstack-ansible] L3HA problem

2016-06-22 Thread fabrice grelaud

> Le 22 juin 2016 à 15:45, Assaf Muller  a écrit :
> 
> On Wed, Jun 22, 2016 at 9:24 AM, fabrice grelaud
> mailto:fabrice.grel...@u-bordeaux.fr>> wrote:
>> Hi,
>> 
>> we deployed our openstack infrastructure with your « exciting » project
>> openstack-ansible (mitaka 13.1.2) but we have some problems with L3HA after
>> create router.
>> 
>> Our infra (closer to the doc):
>> 3 controllers nodes (with bond0 (br-mgmt, br-storage), bond1 (br-vxlan,
>> br-vlan))
>> 2 compute nodes (same for network)
>> 
>> We create an external network (vlan type), an internal network (vxlan type)
>> and a router connected to both networks.
>> And when we launch an instance (cirros), we can’t receive an ip on the vm.
>> 
>> We have:
>> 
>> root@p-osinfra03-utility-container-783041da:~# neutron
>> l3-agent-list-hosting-router router-bim
>> +--+---++---+--+
>> | id   | host
>> | admin_state_up | alive | ha_state |
>> +--+---++---+--+
>> | 3c7918e5-3ad6-4f82-a81b-700790e3c016 |
>> p-osinfra01-neutron-agents-container-f1ab9c14 | True   | :-)   |
>> active   |
>> | f2bf385a-f210-4dbc-8d7d-4b7b845c09b0 |
>> p-osinfra02-neutron-agents-container-48142ffe | True   | :-)   |
>> active   |
>> | 55350fac-16aa-488e-91fd-a7db38179c62 |
>> p-osinfra03-neutron-agents-container-2f6557f0 | True   | :-)   |
>> active   |
>> +--+---++---+—+
>> 
>> I know, i got a problem now because i should have :-) active, :-) standby,
>> :-) standby… Snif...
>> 
>> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns
>> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6
>> qdhcp-0ba266fb-15c4-4566-ae88-92d4c8fd2036
>> 
>> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
>> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ip a sh
>> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
>> default
>>link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>inet 127.0.0.1/8 scope host lo
>>   valid_lft forever preferred_lft forever
>>inet6 ::1/128 scope host
>>   valid_lft forever preferred_lft forever
>> 2: ha-4a5f0287-91@if6:  mtu 1450 qdisc
>> pfifo_fast state UP group default qlen 1000
>>link/ether fa:16:3e:c2:67:a9 brd ff:ff:ff:ff:ff:ff
>>inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-4a5f0287-91
>>   valid_lft forever preferred_lft forever
>>inet 169.254.0.1/24 scope global ha-4a5f0287-91
>>   valid_lft forever preferred_lft forever
>>inet6 fe80::f816:3eff:fec2:67a9/64 scope link
>>   valid_lft forever preferred_lft forever
>> 3: qr-44804d69-88@if9:  mtu 1450 qdisc
>> pfifo_fast state UP group default qlen 1000
>>link/ether fa:16:3e:a5:8c:f2 brd ff:ff:ff:ff:ff:ff
>>inet 192.168.100.254/24 scope global qr-44804d69-88
>>   valid_lft forever preferred_lft forever
>>inet6 fe80::f816:3eff:fea5:8cf2/64 scope link
>>   valid_lft forever preferred_lft forever
>> 4: qg-c5c7378e-1d@if12:  mtu 1500 qdisc
>> pfifo_fast state UP group default qlen 1000
>>link/ether fa:16:3e:b6:4c:97 brd ff:ff:ff:ff:ff:ff
>>inet 147.210.240.11/23 scope global qg-c5c7378e-1d
>>   valid_lft forever preferred_lft forever
>>inet 147.210.240.12/32 scope global qg-c5c7378e-1d
>>   valid_lft forever preferred_lft forever
>>inet6 fe80::f816:3eff:feb6:4c97/64 scope link
>>   valid_lft forever preferred_lft forever
>> 
>> Same result on infra02 and infra03, qr and qg interfaces have the same ip,
>> and ha interfaces the address 169.254.0.1.
>> 
>> If we stop 2 neutron agent containers (p-osinfra02, p-osinfra03) and we
>> restart the first (p-osinfra01), we can reboot the instance and we got an
>> ip, a floating ip and we can access by ssh from internet to the vm. (Note:
>> after few time, we loss our connectivity too).
>> 
>> But if we restart the two containers, we got a ha_state to « standby » until
>> the three become « active » and finally we have the problem again.
>> 
>> The three routers on infra 01/02/03 are seen as master.
>> 
>> If we ping from our instance to the router (internal network 192.168.100.4
>> to 192.168.100.254) we can see some ARP Request
>> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>> 
>> And on the compute node we see all these frames on the various interfaces
>> tap / vxlan-89 / br-vxlan / bond1.vxlanvlan / bond1 / em2 but nothing back.
>> 
>> We also have on ha interface, on each router, the VRRP communication
>> (heartbeat packets over a hidden project network that connects all ha
>> routers (vxlan 70) ) . Priori as no

Re: [openstack-dev] [openstack-ansible] L3HA problem

2016-06-22 Thread Anna Kamyshnikova
Keepalived 1.2.7 is bad version. Please, see comments in this bug
https://bugs.launchpad.net/neutron/+bug/1497272. I suggest you to try one
of the latest version of Keepalived.

On Wed, Jun 22, 2016 at 5:03 PM, fabrice grelaud <
fabrice.grel...@u-bordeaux.fr> wrote:

> Hi,
>
> keepalived 1:1.2.7-1ubuntu
>
>
> Le 22 juin 2016 à 15:41, Anna Kamyshnikova  a
> écrit :
>
> Hi!
>
> What Keepalived version is used?
>
> On Wed, Jun 22, 2016 at 4:24 PM, fabrice grelaud <
> fabrice.grel...@u-bordeaux.fr> wrote:
>
>> Hi,
>>
>> we deployed our openstack infrastructure with your « exciting » project
>> openstack-ansible (mitaka 13.1.2) but we have some problems with L3HA after
>> create router.
>>
>> Our infra (closer to the doc):
>> 3 controllers nodes (with bond0 (br-mgmt, br-storage), bond1 (br-vxlan,
>> br-vlan))
>> 2 compute nodes (same for network)
>>
>> We create an external network (vlan type), an internal network (vxlan
>> type) and a router connected to both networks.
>> And when we launch an instance (cirros), we can’t receive an ip on the vm.
>>
>> We have:
>>
>> root@p-osinfra03-utility-container-783041da:~# neutron
>> l3-agent-list-hosting-router router-bim
>>
>> +--+---++---+--+
>> | id   | host
>>  | admin_state_up | alive | ha_state |
>>
>> +--+---++---+--+
>> | 3c7918e5-3ad6-4f82-a81b-700790e3c016 |
>> p-osinfra01-neutron-agents-container-f1ab9c14 | True   | :-)   |
>> active   |
>> | f2bf385a-f210-4dbc-8d7d-4b7b845c09b0 |
>> p-osinfra02-neutron-agents-container-48142ffe | True   | :-)   |
>> active   |
>> | 55350fac-16aa-488e-91fd-a7db38179c62 |
>> p-osinfra03-neutron-agents-container-2f6557f0 | True   | :-)   |
>> active   |
>>
>> +--+---++---+—+
>>
>> I know, i got a problem now because i should have :-) active, :-)
>> standby, :-) standby… Snif...
>>
>> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns
>> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6
>> qdhcp-0ba266fb-15c4-4566-ae88-92d4c8fd2036
>>
>> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
>> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ip a sh
>> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
>> default
>> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>> inet 127.0.0.1/8 scope host lo
>>valid_lft forever preferred_lft forever
>> inet6 ::1/128 scope host
>>valid_lft forever preferred_lft forever
>> 2: ha-4a5f0287-91@if6:  mtu 1450 qdisc
>> pfifo_fast state UP group default qlen 1000
>> link/ether fa:16:3e:c2:67:a9 brd ff:ff:ff:ff:ff:ff
>> inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-4a5f0287-91
>>valid_lft forever preferred_lft forever
>> inet 169.254.0.1/24 scope global ha-4a5f0287-91
>>valid_lft forever preferred_lft forever
>> inet6 fe80::f816:3eff:fec2:67a9/64 scope link
>>valid_lft forever preferred_lft forever
>> 3: qr-44804d69-88@if9:  mtu 1450 qdisc
>> pfifo_fast state UP group default qlen 1000
>> link/ether fa:16:3e:a5:8c:f2 brd ff:ff:ff:ff:ff:ff
>> inet 192.168.100.254/24 scope global qr-44804d69-88
>>valid_lft forever preferred_lft forever
>> inet6 fe80::f816:3eff:fea5:8cf2/64 scope link
>>valid_lft forever preferred_lft forever
>> 4: qg-c5c7378e-1d@if12:  mtu 1500 qdisc
>> pfifo_fast state UP group default qlen 1000
>> link/ether fa:16:3e:b6:4c:97 brd ff:ff:ff:ff:ff:ff
>> inet 147.210.240.11/23 scope global qg-c5c7378e-1d
>>valid_lft forever preferred_lft forever
>> inet 147.210.240.12/32 scope global qg-c5c7378e-1d
>>valid_lft forever preferred_lft forever
>> inet6 fe80::f816:3eff:feb6:4c97/64 scope link
>>valid_lft forever preferred_lft forever
>>
>> Same result on infra02 and infra03, qr and qg interfaces have the same
>> ip, and ha interfaces the address 169.254.0.1.
>>
>> If we stop 2 neutron agent containers (p-osinfra02, p-osinfra03) and we
>> restart the first (p-osinfra01), we can reboot the instance and we got an
>> ip, a floating ip and we can access by ssh from internet to the vm. (Note:
>> after few time, we loss our connectivity too).
>>
>> But if we restart the two containers, we got a ha_state to « standby »
>> until the three become « active » and finally we have the problem again.
>>
>> The three routers on infra 01/02/03 are seen as master.
>>
>> If we ping from our instance to the router (internal network
>> 192.168.100.4 to 192.168.100.254) we can see some ARP Request
>> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>> ARP, Request who-has 192.1

Re: [openstack-dev] [openstack-ansible] L3HA problem

2016-06-22 Thread fabrice grelaud
Hi,

keepalived 1:1.2.7-1ubuntu


> Le 22 juin 2016 à 15:41, Anna Kamyshnikova  a 
> écrit :
> 
> Hi!
> 
> What Keepalived version is used?
> 
> On Wed, Jun 22, 2016 at 4:24 PM, fabrice grelaud 
> mailto:fabrice.grel...@u-bordeaux.fr>> wrote:
> Hi,
> 
> we deployed our openstack infrastructure with your « exciting » project 
> openstack-ansible (mitaka 13.1.2) but we have some problems with L3HA after 
> create router.
> 
> Our infra (closer to the doc):
> 3 controllers nodes (with bond0 (br-mgmt, br-storage), bond1 (br-vxlan, 
> br-vlan))
> 2 compute nodes (same for network)
> 
> We create an external network (vlan type), an internal network (vxlan type) 
> and a router connected to both networks.
> And when we launch an instance (cirros), we can’t receive an ip on the vm.
> 
> We have:
> 
> root@p-osinfra03-utility-container-783041da:~# neutron 
> l3-agent-list-hosting-router router-bim
> +--+---++---+--+
> | id   | host 
>  | admin_state_up | alive | ha_state |
> +--+---++---+--+
> | 3c7918e5-3ad6-4f82-a81b-700790e3c016 | 
> p-osinfra01-neutron-agents-container-f1ab9c14 | True   | :-)   | 
> active   |
> | f2bf385a-f210-4dbc-8d7d-4b7b845c09b0 | 
> p-osinfra02-neutron-agents-container-48142ffe | True   | :-)   | 
> active   |
> | 55350fac-16aa-488e-91fd-a7db38179c62 | 
> p-osinfra03-neutron-agents-container-2f6557f0 | True   | :-)   | 
> active   |
> +--+---++---+—+
> 
> I know, i got a problem now because i should have :-) active, :-) standby, 
> :-) standby… Snif...
> 
> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns
> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6
> qdhcp-0ba266fb-15c4-4566-ae88-92d4c8fd2036
> 
> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec 
> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ip a sh
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
> default 
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8  scope host lo
>valid_lft forever preferred_lft forever
> inet6 ::1/128 scope host 
>valid_lft forever preferred_lft forever
> 2: ha-4a5f0287-91@if6:  mtu 1450 qdisc 
> pfifo_fast state UP group default qlen 1000
> link/ether fa:16:3e:c2:67:a9 brd ff:ff:ff:ff:ff:ff
> inet 169.254.192.1/18  brd 169.254.255.255 scope 
> global ha-4a5f0287-91
>valid_lft forever preferred_lft forever
> inet 169.254.0.1/24  scope global ha-4a5f0287-91
>valid_lft forever preferred_lft forever
> inet6 fe80::f816:3eff:fec2:67a9/64 scope link 
>valid_lft forever preferred_lft forever
> 3: qr-44804d69-88@if9:  mtu 1450 qdisc 
> pfifo_fast state UP group default qlen 1000
> link/ether fa:16:3e:a5:8c:f2 brd ff:ff:ff:ff:ff:ff
> inet 192.168.100.254/24  scope global 
> qr-44804d69-88
>valid_lft forever preferred_lft forever
> inet6 fe80::f816:3eff:fea5:8cf2/64 scope link 
>valid_lft forever preferred_lft forever
> 4: qg-c5c7378e-1d@if12:  mtu 1500 qdisc 
> pfifo_fast state UP group default qlen 1000
> link/ether fa:16:3e:b6:4c:97 brd ff:ff:ff:ff:ff:ff
> inet 147.210.240.11/23  scope global 
> qg-c5c7378e-1d
>valid_lft forever preferred_lft forever
> inet 147.210.240.12/32  scope global 
> qg-c5c7378e-1d
>valid_lft forever preferred_lft forever
> inet6 fe80::f816:3eff:feb6:4c97/64 scope link 
>valid_lft forever preferred_lft forever
> 
> Same result on infra02 and infra03, qr and qg interfaces have the same ip, 
> and ha interfaces the address 169.254.0.1.
> 
> If we stop 2 neutron agent containers (p-osinfra02, p-osinfra03) and we 
> restart the first (p-osinfra01), we can reboot the instance and we got an ip, 
> a floating ip and we can access by ssh from internet to the vm. (Note: after 
> few time, we loss our connectivity too).
> 
> But if we restart the two containers, we got a ha_state to « standby » until 
> the three become « active » and finally we have the problem again.
> 
> The three routers on infra 01/02/03 are seen as master.
> 
> If we ping from our instance to the router (internal network 192.168.100.4 to 
> 192.168.100.254) we can see some ARP Request
> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> 
> And on the compute node we see all these frames on the various interfaces tap 
> / vxlan

Re: [openstack-dev] [openstack-ansible] L3HA problem

2016-06-22 Thread Assaf Muller
On Wed, Jun 22, 2016 at 9:24 AM, fabrice grelaud
 wrote:
> Hi,
>
> we deployed our openstack infrastructure with your « exciting » project
> openstack-ansible (mitaka 13.1.2) but we have some problems with L3HA after
> create router.
>
> Our infra (closer to the doc):
> 3 controllers nodes (with bond0 (br-mgmt, br-storage), bond1 (br-vxlan,
> br-vlan))
> 2 compute nodes (same for network)
>
> We create an external network (vlan type), an internal network (vxlan type)
> and a router connected to both networks.
> And when we launch an instance (cirros), we can’t receive an ip on the vm.
>
> We have:
>
> root@p-osinfra03-utility-container-783041da:~# neutron
> l3-agent-list-hosting-router router-bim
> +--+---++---+--+
> | id   | host
> | admin_state_up | alive | ha_state |
> +--+---++---+--+
> | 3c7918e5-3ad6-4f82-a81b-700790e3c016 |
> p-osinfra01-neutron-agents-container-f1ab9c14 | True   | :-)   |
> active   |
> | f2bf385a-f210-4dbc-8d7d-4b7b845c09b0 |
> p-osinfra02-neutron-agents-container-48142ffe | True   | :-)   |
> active   |
> | 55350fac-16aa-488e-91fd-a7db38179c62 |
> p-osinfra03-neutron-agents-container-2f6557f0 | True   | :-)   |
> active   |
> +--+---++---+—+
>
> I know, i got a problem now because i should have :-) active, :-) standby,
> :-) standby… Snif...
>
> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns
> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6
> qdhcp-0ba266fb-15c4-4566-ae88-92d4c8fd2036
>
> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ip a sh
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
> default
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 scope host lo
>valid_lft forever preferred_lft forever
> inet6 ::1/128 scope host
>valid_lft forever preferred_lft forever
> 2: ha-4a5f0287-91@if6:  mtu 1450 qdisc
> pfifo_fast state UP group default qlen 1000
> link/ether fa:16:3e:c2:67:a9 brd ff:ff:ff:ff:ff:ff
> inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-4a5f0287-91
>valid_lft forever preferred_lft forever
> inet 169.254.0.1/24 scope global ha-4a5f0287-91
>valid_lft forever preferred_lft forever
> inet6 fe80::f816:3eff:fec2:67a9/64 scope link
>valid_lft forever preferred_lft forever
> 3: qr-44804d69-88@if9:  mtu 1450 qdisc
> pfifo_fast state UP group default qlen 1000
> link/ether fa:16:3e:a5:8c:f2 brd ff:ff:ff:ff:ff:ff
> inet 192.168.100.254/24 scope global qr-44804d69-88
>valid_lft forever preferred_lft forever
> inet6 fe80::f816:3eff:fea5:8cf2/64 scope link
>valid_lft forever preferred_lft forever
> 4: qg-c5c7378e-1d@if12:  mtu 1500 qdisc
> pfifo_fast state UP group default qlen 1000
> link/ether fa:16:3e:b6:4c:97 brd ff:ff:ff:ff:ff:ff
> inet 147.210.240.11/23 scope global qg-c5c7378e-1d
>valid_lft forever preferred_lft forever
> inet 147.210.240.12/32 scope global qg-c5c7378e-1d
>valid_lft forever preferred_lft forever
> inet6 fe80::f816:3eff:feb6:4c97/64 scope link
>valid_lft forever preferred_lft forever
>
> Same result on infra02 and infra03, qr and qg interfaces have the same ip,
> and ha interfaces the address 169.254.0.1.
>
> If we stop 2 neutron agent containers (p-osinfra02, p-osinfra03) and we
> restart the first (p-osinfra01), we can reboot the instance and we got an
> ip, a floating ip and we can access by ssh from internet to the vm. (Note:
> after few time, we loss our connectivity too).
>
> But if we restart the two containers, we got a ha_state to « standby » until
> the three become « active » and finally we have the problem again.
>
> The three routers on infra 01/02/03 are seen as master.
>
> If we ping from our instance to the router (internal network 192.168.100.4
> to 192.168.100.254) we can see some ARP Request
> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>
> And on the compute node we see all these frames on the various interfaces
> tap / vxlan-89 / br-vxlan / bond1.vxlanvlan / bond1 / em2 but nothing back.
>
> We also have on ha interface, on each router, the VRRP communication
> (heartbeat packets over a hidden project network that connects all ha
> routers (vxlan 70) ) . Priori as normal, each router thinks to be master.
>
> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 tcpdump -nl -i ha-4a5f0287-91
>

Re: [openstack-dev] [openstack-ansible] L3HA problem

2016-06-22 Thread Anna Kamyshnikova
Hi!

What Keepalived version is used?

On Wed, Jun 22, 2016 at 4:24 PM, fabrice grelaud <
fabrice.grel...@u-bordeaux.fr> wrote:

> Hi,
>
> we deployed our openstack infrastructure with your « exciting » project
> openstack-ansible (mitaka 13.1.2) but we have some problems with L3HA after
> create router.
>
> Our infra (closer to the doc):
> 3 controllers nodes (with bond0 (br-mgmt, br-storage), bond1 (br-vxlan,
> br-vlan))
> 2 compute nodes (same for network)
>
> We create an external network (vlan type), an internal network (vxlan
> type) and a router connected to both networks.
> And when we launch an instance (cirros), we can’t receive an ip on the vm.
>
> We have:
>
> root@p-osinfra03-utility-container-783041da:~# neutron
> l3-agent-list-hosting-router router-bim
>
> +--+---++---+--+
> | id   | host
>  | admin_state_up | alive | ha_state |
>
> +--+---++---+--+
> | 3c7918e5-3ad6-4f82-a81b-700790e3c016 |
> p-osinfra01-neutron-agents-container-f1ab9c14 | True   | :-)   |
> active   |
> | f2bf385a-f210-4dbc-8d7d-4b7b845c09b0 |
> p-osinfra02-neutron-agents-container-48142ffe | True   | :-)   |
> active   |
> | 55350fac-16aa-488e-91fd-a7db38179c62 |
> p-osinfra03-neutron-agents-container-2f6557f0 | True   | :-)   |
> active   |
>
> +--+---++---+—+
>
> I know, i got a problem now because i should have :-) active, :-) standby,
> :-) standby… Snif...
>
> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns
> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6
> qdhcp-0ba266fb-15c4-4566-ae88-92d4c8fd2036
>
> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ip a sh
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group
> default
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 scope host lo
>valid_lft forever preferred_lft forever
> inet6 ::1/128 scope host
>valid_lft forever preferred_lft forever
> 2: ha-4a5f0287-91@if6:  mtu 1450 qdisc
> pfifo_fast state UP group default qlen 1000
> link/ether fa:16:3e:c2:67:a9 brd ff:ff:ff:ff:ff:ff
> inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-4a5f0287-91
>valid_lft forever preferred_lft forever
> inet 169.254.0.1/24 scope global ha-4a5f0287-91
>valid_lft forever preferred_lft forever
> inet6 fe80::f816:3eff:fec2:67a9/64 scope link
>valid_lft forever preferred_lft forever
> 3: qr-44804d69-88@if9:  mtu 1450 qdisc
> pfifo_fast state UP group default qlen 1000
> link/ether fa:16:3e:a5:8c:f2 brd ff:ff:ff:ff:ff:ff
> inet 192.168.100.254/24 scope global qr-44804d69-88
>valid_lft forever preferred_lft forever
> inet6 fe80::f816:3eff:fea5:8cf2/64 scope link
>valid_lft forever preferred_lft forever
> 4: qg-c5c7378e-1d@if12:  mtu 1500 qdisc
> pfifo_fast state UP group default qlen 1000
> link/ether fa:16:3e:b6:4c:97 brd ff:ff:ff:ff:ff:ff
> inet 147.210.240.11/23 scope global qg-c5c7378e-1d
>valid_lft forever preferred_lft forever
> inet 147.210.240.12/32 scope global qg-c5c7378e-1d
>valid_lft forever preferred_lft forever
> inet6 fe80::f816:3eff:feb6:4c97/64 scope link
>valid_lft forever preferred_lft forever
>
> Same result on infra02 and infra03, qr and qg interfaces have the same ip,
> and ha interfaces the address 169.254.0.1.
>
> If we stop 2 neutron agent containers (p-osinfra02, p-osinfra03) and we
> restart the first (p-osinfra01), we can reboot the instance and we got an
> ip, a floating ip and we can access by ssh from internet to the vm. (Note:
> after few time, we loss our connectivity too).
>
> But if we restart the two containers, we got a ha_state to « standby »
> until the three become « active » and finally we have the problem again.
>
> The three routers on infra 01/02/03 are seen as master.
>
> If we ping from our instance to the router (internal network 192.168.100.4
> to 192.168.100.254) we can see some ARP Request
> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>
> And on the compute node we see all these frames on the various interfaces
> tap / vxlan-89 / br-vxlan / bond1.vxlanvlan / bond1 / em2 but nothing back.
>
> We also have on ha interface, on each router, the VRRP communication
> (heartbeat packets over a hidden project network that connects all ha
> routers (vxlan 70) ) . Priori as normal, each router thinks to be master.
>
> root@p-osinfra01-neutron-agents-container-f1ab9c14:~# i

[openstack-dev] [openstack-ansible] L3HA problem

2016-06-22 Thread fabrice grelaud
Hi,

we deployed our openstack infrastructure with your « exciting » project 
openstack-ansible (mitaka 13.1.2) but we have some problems with L3HA after 
create router.

Our infra (closer to the doc):
3 controllers nodes (with bond0 (br-mgmt, br-storage), bond1 (br-vxlan, 
br-vlan))
2 compute nodes (same for network)

We create an external network (vlan type), an internal network (vxlan type) and 
a router connected to both networks.
And when we launch an instance (cirros), we can’t receive an ip on the vm.

We have:

root@p-osinfra03-utility-container-783041da:~# neutron 
l3-agent-list-hosting-router router-bim
+--+---++---+--+
| id   | host   
   | admin_state_up | alive | ha_state |
+--+---++---+--+
| 3c7918e5-3ad6-4f82-a81b-700790e3c016 | 
p-osinfra01-neutron-agents-container-f1ab9c14 | True   | :-)   | active 
  |
| f2bf385a-f210-4dbc-8d7d-4b7b845c09b0 | 
p-osinfra02-neutron-agents-container-48142ffe | True   | :-)   | active 
  |
| 55350fac-16aa-488e-91fd-a7db38179c62 | 
p-osinfra03-neutron-agents-container-2f6557f0 | True   | :-)   | active 
  |
+--+---++---+—+

I know, i got a problem now because i should have :-) active, :-) standby, :-) 
standby… Snif...

root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns
qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6
qdhcp-0ba266fb-15c4-4566-ae88-92d4c8fd2036

root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec 
qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ip a sh
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default 
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host 
   valid_lft forever preferred_lft forever
2: ha-4a5f0287-91@if6:  mtu 1450 qdisc 
pfifo_fast state UP group default qlen 1000
link/ether fa:16:3e:c2:67:a9 brd ff:ff:ff:ff:ff:ff
inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-4a5f0287-91
   valid_lft forever preferred_lft forever
inet 169.254.0.1/24 scope global ha-4a5f0287-91
   valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fec2:67a9/64 scope link 
   valid_lft forever preferred_lft forever
3: qr-44804d69-88@if9:  mtu 1450 qdisc 
pfifo_fast state UP group default qlen 1000
link/ether fa:16:3e:a5:8c:f2 brd ff:ff:ff:ff:ff:ff
inet 192.168.100.254/24 scope global qr-44804d69-88
   valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fea5:8cf2/64 scope link 
   valid_lft forever preferred_lft forever
4: qg-c5c7378e-1d@if12:  mtu 1500 qdisc 
pfifo_fast state UP group default qlen 1000
link/ether fa:16:3e:b6:4c:97 brd ff:ff:ff:ff:ff:ff
inet 147.210.240.11/23 scope global qg-c5c7378e-1d
   valid_lft forever preferred_lft forever
inet 147.210.240.12/32 scope global qg-c5c7378e-1d
   valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:feb6:4c97/64 scope link 
   valid_lft forever preferred_lft forever

Same result on infra02 and infra03, qr and qg interfaces have the same ip, and 
ha interfaces the address 169.254.0.1.

If we stop 2 neutron agent containers (p-osinfra02, p-osinfra03) and we restart 
the first (p-osinfra01), we can reboot the instance and we got an ip, a 
floating ip and we can access by ssh from internet to the vm. (Note: after few 
time, we loss our connectivity too).

But if we restart the two containers, we got a ha_state to « standby » until 
the three become « active » and finally we have the problem again.

The three routers on infra 01/02/03 are seen as master.

If we ping from our instance to the router (internal network 192.168.100.4 to 
192.168.100.254) we can see some ARP Request
ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28

And on the compute node we see all these frames on the various interfaces tap / 
vxlan-89 / br-vxlan / bond1.vxlanvlan / bond1 / em2 but nothing back.

We also have on ha interface, on each router, the VRRP communication (heartbeat 
packets over a hidden project network that connects all ha routers (vxlan 70) ) 
. Priori as normal, each router thinks to be master.
  
root@p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec 
qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 tcpdump -nl -i ha-4a5f0287-91
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ha-4a5f0287-91, link-type EN10MB (Ethernet), capture size 65535 
bytes
IP 169.254.192.