Re: [Openstack] dnsmasq stops talking to instances?
On Fri, Oct 19, 2012 at 10:24:20AM -0400, Lars Kellogg-Stedman wrote: It happened again last night -- which means we were without networking on our instances for about seven hours -- and restarting nova-network doesn't resolve the problem. It is necessary to first kill dnsmasq (and allow nova-network to restart it). In case folks were curious: I'm pretty sure this was a bad interaction between dhclient on the host and the interface being used for instance networking. We've been running stabling now for a week. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] dnsmasq stops talking to instances?
On Mon, Oct 22, 2012 at 01:54:11PM +0200, Gary Kotton wrote: Can you please explain the problems that you had with qpid? OpenStack components were periodically losing touch with each other. Requests to boot/delete an instance, for example, would never make it as far the compute hosts. They would get stuck scheduling. Initially we thought this was exclusively a problem with the network firewall infrastructure (there was a default 1 hour idle connection timeout), but reconfiguring our OpenStack environment to remove the firewalls from the picture did not resolve this problem. Since replacing qpid with rabbitmq, we have not had a single recurrence of this behavior. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] dnsmasq stops talking to instances?
On 10/18/2012 05:42 PM, Lars Kellogg-Stedman wrote: The good news is that since replacing qpid with rabbitmq our environment seems to have stabilized to the point that it's *almost* useful. Can you please explain the problems that you had with qpid? The last remaining issue is that dnsmasq will occasionally stop responding to instances. Killing dnsmasq and restarting openstack-nova-network makes things work again, but I haven't been able to figure out why dnsmasq stops responding in the first place. Has anyone seen this behavior before? Any pointers would be greatly appreciated. Are you using the traditional nova networking or Quantum? Thanks Gary ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] dnsmasq stops talking to instances?
On Thu, Oct 18, 2012 at 06:16:07PM +0100, Ronivon Costa wrote: I have noticed a similar behaviour, for example when the switch/router is rebooted. I am able to recover the communications with the VMs restarting nova network (no need to kill dnsmasq). There are no network devices being rebooted here...and since we're running in multi_host mode, both dnsmasq and the affected instances are running *on the same physical system*. It happened again last night -- which means we were without networking on our instances for about seven hours -- and restarting nova-network doesn't resolve the problem. It is necessary to first kill dnsmasq (and allow nova-network to restart it). There are no errors being logged by dnsmasq; started just after 2AM, all of the DHCPREQUEST ... traffic just stops, and the logs after that point look like this: Oct 19 02:02:34 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses Oct 19 02:02:34 stack-1 dnsmasq[32013]: read /var/lib/nova/networks/nova-br662.conf Oct 19 02:02:35 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses Oct 19 02:02:35 stack-1 dnsmasq[32013]: read /var/lib/nova/networks/nova-br662.conf Oct 19 02:03:12 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses Oct 19 02:03:12 stack-1 dnsmasq[32013]: read /var/lib/nova/networks/nova-br662.conf ...until I restart things. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] dnsmasq stops talking to instances?
I've noticed similar behavior where dnsmasq stops working if nova-network is restarted without first killing all the dnsmasq processes. On Oct 19, 2012 10:24 AM, Lars Kellogg-Stedman l...@seas.harvard.edu wrote: On Thu, Oct 18, 2012 at 06:16:07PM +0100, Ronivon Costa wrote: I have noticed a similar behaviour, for example when the switch/router is rebooted. I am able to recover the communications with the VMs restarting nova network (no need to kill dnsmasq). There are no network devices being rebooted here...and since we're running in multi_host mode, both dnsmasq and the affected instances are running *on the same physical system*. It happened again last night -- which means we were without networking on our instances for about seven hours -- and restarting nova-network doesn't resolve the problem. It is necessary to first kill dnsmasq (and allow nova-network to restart it). There are no errors being logged by dnsmasq; started just after 2AM, all of the DHCPREQUEST ... traffic just stops, and the logs after that point look like this: Oct 19 02:02:34 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses Oct 19 02:02:34 stack-1 dnsmasq[32013]: read /var/lib/nova/networks/nova-br662.conf Oct 19 02:02:35 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses Oct 19 02:02:35 stack-1 dnsmasq[32013]: read /var/lib/nova/networks/nova-br662.conf Oct 19 02:03:12 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses Oct 19 02:03:12 stack-1 dnsmasq[32013]: read /var/lib/nova/networks/nova-br662.conf ...until I restart things. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] dnsmasq stops talking to instances?
Hi Lars, There are no errors being logged by dnsmasq; started just after 2AM, all of the DHCPREQUEST ... traffic just stops, and the logs after that point look like this: We ran into similar issues that turned out to be a qemu bug: https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978 The fixed qemu-kvm package is now in the main Ubuntu repository (assuming you are using Ubuntu 12.04), so maybe upgrading it will resolve your issue quickly. Note that upgrading it does not affect currently running instances (and subsequently means only newly launched instances will be fixed). Oct 19 02:02:34 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses Oct 19 02:02:34 stack-1 dnsmasq[32013]: read /var/lib/nova/networks/nova-br662.conf Oct 19 02:02:35 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses Oct 19 02:02:35 stack-1 dnsmasq[32013]: read /var/lib/nova/networks/nova-br662.conf Oct 19 02:03:12 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses Oct 19 02:03:12 stack-1 dnsmasq[32013]: read /var/lib/nova/networks/nova-br662.conf ...until I restart things. For us, this included restarting the instance since it lost its IP address from lack of DHCP traffic. While we were troubleshooting the issue, we ended up adding a local account to each instance so we can log into the vnc console and restart networking services. Thanks, Joe -- Joe Topjian Systems Administrator Cybera Inc. www.cybera.ca Cybera is a not-for-profit organization that works to spur and support innovation, for the economic benefit of Alberta, through the use of cyberinfrastructure. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] dnsmasq stops talking to instances?
The good news is that since replacing qpid with rabbitmq our environment seems to have stabilized to the point that it's *almost* useful. The last remaining issue is that dnsmasq will occasionally stop responding to instances. Killing dnsmasq and restarting openstack-nova-network makes things work again, but I haven't been able to figure out why dnsmasq stops responding in the first place. Has anyone seen this behavior before? Any pointers would be greatly appreciated. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp