On Wed, Nov 7, 2012 at 5:52 PM, Gary Kotton <[email protected]> wrote: > On 11/07/2012 11:47 AM, Aniruddha Khadkikar wrote: >> >> Hi Stackers, >> >> We have a small Openstack lab using three servers. The components are >> distributed as: >> 1. Network controller - Quantum L3& DHCP, L2 agent, Nova, Openvswitch >> >> 2. Cloud controller - Quantum server, L2 agent, Nova, Openvswitch, >> Dashboard, API, MySQL, Rabbitmq >> 3. Compute node - Nova, Openvswitch, L2 agent >> >> The network is setup in the following way: >> 1. Each server has 4 nics. We are using only one public IP and one >> private IP for the openstack setup. We have a private switch for >> inter-vm communication >> 2. We are using gre tunnelling and openvswitch >> 3. br-int is assigned an IP address >> 4. br-ex is configured for floating IP allocation >> >> Everything works perfectly when we are setting it up from scratch!!!! >> >> Each vm is able to get the private IP's assigned and the NAT based >> floating IP is also assigned and we are able to SSH into it. >> The VM's also get created on all the three hosts. >> >> So we are confident that we have the right configurations in place as >> we have fully operational Openstack implementation using gre-tunnels. >> >> In order to test the resilience of the setup, we decided to reboot the >> servers to see if everything comes up again. We faced some dependency >> of services errors and after server reboot we restarted the services >> in the proper order i.e. on cloud controller we have mysql, rabbitmq, >> keystone, openvswitch and quantum-server started. This was followed by >> starting openvswitch, L3, dhcp and L2 agent. After which we started L2 >> agents on all the remaining servers and followed by nova. There is >> some confusion on how to orchestrate the right order of services. This >> could possibly be something we will need to work upon in future. >> >> After this, we have nova working properly i.e. we are able to create >> vm's and the pre-existing ones are also started (virsh list also shows >> the vm's). ovsctl shows all the interfaces as earlier. However we are >> unable to access the vm's. On logging into the vm we do not see any IP >> address being assigned as the VM is unable to contact the dhcp server. >> >> The questions that come up are: >> * What could change after a reboot that would compromise a running >> network configuration? >> * Could there be issues with the TAP interfaces created? What is the >> best way to troubleshoot such a situation? >> * Has anyone seen a similar behaviour and is it specific to when we >> use gre-tunnels? Is it then specific to openvswitch which we are >> using? >> * On reboot of the network controller are any steps required to ensure >> that Openstack continues to function properly? > > > Can you please look in the log files for Quantum and see if there are any > errors? > > There is an open issue with Quantum and QPID after rebooting - the Quantum > service hangs? On the host for Quantum is you do "netstat -an |grep 9696" do > you see anything? >
Unfortunately we recreated the cloud again. This time however we have not assigned an IP to the br-int interface. It is working currently as we will do the reboot today. By evening I will provide details of the errors. In the syslog on the network node we started seeing a lot of: Nov 7 12:59:30 dnsmasq-dhcp[5722]: last message repeated 3 times Nov 7 12:59:30 us000901 dnsmasq-dhcp[5746]: DHCPDISCOVER(tap224fcabc-70) fa:16:3e:52:38:ce Nov 7 12:59:30 us000901 dnsmasq-dhcp[5722]: DHCPDISCOVER(tap7736e97e-5c) fa:16:3e:52:38:ce no address available Nov 7 12:59:30 us000901 dnsmasq-dhcp[5746]: DHCPOFFER(tap224fcabc-70) 172.24.2.11 fa:16:3e:52:38:ce Nov 7 12:59:30 us000901 dnsmasq-dhcp[5722]: DHCPDISCOVER(tap7736e97e-5c) fa:16:3e:52:38:ce no address available Nov 7 12:59:39 us000901 dnsmasq-dhcp[5722]: DHCPDISCOVER(tap7736e97e-5c) fa:16:3e:52:38:ce no address available Nov 7 12:59:39 us000901 dnsmasq-dhcp[5746]: DHCPDISCOVER(tap224fcabc-70) fa:16:3e:52:38:ce Nov 7 12:59:39 us000901 dnsmasq-dhcp[5746]: DHCPOFFER(tap224fcabc-70) 172.24.2.11 fa:16:3e:52:38:ce Nov 7 12:59:57 us000901 dnsmasq-dhcp[5722]: DHCPDISCOVER(tap7736e97e-5c) fa:16:3e:52:38:ce no address available Nov 7 12:59:57 us000901 dnsmasq-dhcp[5746]: DHCPDISCOVER(tap224fcabc-70) fa:16:3e:52:38:ce Nov 7 12:59:57 us000901 dnsmasq-dhcp[5746]: DHCPOFFER(tap224fcabc-70) 172.24.2.11 fa:16:3e:52:38:ce The above actions are associated with near 100% cpu for kvm processes and dnsmasq. The Quantum dhcp log relevant part is at http://pastebin.com/GmksGeK6 Regards Aniruddha >> >> The setup has failed twice on reboot. For the second iteration we are >> assigning the IP on startup to br-int so that openvswitch does not >> give errors. >> >> Regards >> Aniruddha >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~openstack >> Post to : [email protected] >> Unsubscribe : https://launchpad.net/~openstack >> More help : https://help.launchpad.net/ListHelp > > > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : [email protected] > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : [email protected] Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp

