Re: [Openstack] dnsmasq stops talking to instances?

2012-10-26 Thread Lars Kellogg-Stedman
On Fri, Oct 19, 2012 at 10:24:20AM -0400, Lars Kellogg-Stedman wrote:
 It happened again last night -- which means we were without networking
 on our instances for about seven hours -- and restarting nova-network
 doesn't resolve the problem.  It is necessary to first kill dnsmasq
 (and allow nova-network to restart it).

In case folks were curious:

I'm pretty sure this was a bad interaction between dhclient on the
host and the interface being used for instance networking.  We've been
running stabling now for a week.

-- 
Lars Kellogg-Stedman l...@seas.harvard.edu  |
Senior Technologist   | http://ac.seas.harvard.edu/
Academic Computing| http://code.seas.harvard.edu/
Harvard School of Engineering |
  and Applied Sciences|


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] dnsmasq stops talking to instances?

2012-10-26 Thread Lars Kellogg-Stedman
On Mon, Oct 22, 2012 at 01:54:11PM +0200, Gary Kotton wrote:
 Can you please explain the problems that you had with qpid?

OpenStack components were periodically losing touch with each other.
Requests to boot/delete an instance, for example, would never make it
as far the compute hosts.  They would get stuck scheduling.

Initially we thought this was exclusively a problem with the network
firewall infrastructure (there was a default 1 hour idle connection
timeout), but reconfiguring our OpenStack environment to remove the
firewalls from the picture did not resolve this problem.

Since replacing qpid with rabbitmq, we have not had a single
recurrence of this behavior.

-- 
Lars Kellogg-Stedman l...@seas.harvard.edu  |
Senior Technologist   | http://ac.seas.harvard.edu/
Academic Computing| http://code.seas.harvard.edu/
Harvard School of Engineering |
  and Applied Sciences|


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] dnsmasq stops talking to instances?

2012-10-22 Thread Gary Kotton

On 10/18/2012 05:42 PM, Lars Kellogg-Stedman wrote:

The good news is that since replacing qpid with rabbitmq our
environment seems to have stabilized to the point that it's *almost*
useful.


Can you please explain the problems that you had with qpid?



The last remaining issue is that dnsmasq will occasionally stop
responding to instances.  Killing dnsmasq and restarting
openstack-nova-network makes things work again, but I haven't been
able to figure out why dnsmasq stops responding in the first place.

Has anyone seen this behavior before?  Any pointers would be greatly
appreciated.


Are you using the traditional nova networking or Quantum?
Thanks
Gary





___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] dnsmasq stops talking to instances?

2012-10-19 Thread Lars Kellogg-Stedman
On Thu, Oct 18, 2012 at 06:16:07PM +0100, Ronivon Costa wrote:
I have noticed a similar behaviour, for example when the switch/router is
rebooted. I am able to recover the communications with the VMs  restarting
nova network (no need to kill dnsmasq).

There are no network devices being rebooted here...and since we're
running in multi_host mode, both dnsmasq and the affected instances
are running *on the same physical system*.

It happened again last night -- which means we were without networking
on our instances for about seven hours -- and restarting nova-network
doesn't resolve the problem.  It is necessary to first kill dnsmasq
(and allow nova-network to restart it).

There are no errors being logged by dnsmasq; started just after 2AM,
all of the DHCPREQUEST ... traffic just stops, and the logs after
that point look like this:

Oct 19 02:02:34 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses
Oct 19 02:02:34 stack-1 dnsmasq[32013]: read 
/var/lib/nova/networks/nova-br662.conf
Oct 19 02:02:35 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses
Oct 19 02:02:35 stack-1 dnsmasq[32013]: read 
/var/lib/nova/networks/nova-br662.conf
Oct 19 02:03:12 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses
Oct 19 02:03:12 stack-1 dnsmasq[32013]: read 
/var/lib/nova/networks/nova-br662.conf

...until I restart things.

-- 
Lars Kellogg-Stedman l...@seas.harvard.edu  |
Senior Technologist   | http://ac.seas.harvard.edu/
Academic Computing| http://code.seas.harvard.edu/
Harvard School of Engineering |
  and Applied Sciences|


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] dnsmasq stops talking to instances?

2012-10-19 Thread Nathanael Burton
I've noticed similar behavior where dnsmasq stops working if nova-network
is restarted without first killing all the dnsmasq processes.
On Oct 19, 2012 10:24 AM, Lars Kellogg-Stedman l...@seas.harvard.edu
wrote:

 On Thu, Oct 18, 2012 at 06:16:07PM +0100, Ronivon Costa wrote:
 I have noticed a similar behaviour, for example when the
 switch/router is
 rebooted. I am able to recover the communications with the VMs
 restarting
 nova network (no need to kill dnsmasq).

 There are no network devices being rebooted here...and since we're
 running in multi_host mode, both dnsmasq and the affected instances
 are running *on the same physical system*.

 It happened again last night -- which means we were without networking
 on our instances for about seven hours -- and restarting nova-network
 doesn't resolve the problem.  It is necessary to first kill dnsmasq
 (and allow nova-network to restart it).

 There are no errors being logged by dnsmasq; started just after 2AM,
 all of the DHCPREQUEST ... traffic just stops, and the logs after
 that point look like this:

 Oct 19 02:02:34 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses
 Oct 19 02:02:34 stack-1 dnsmasq[32013]: read
 /var/lib/nova/networks/nova-br662.conf
 Oct 19 02:02:35 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses
 Oct 19 02:02:35 stack-1 dnsmasq[32013]: read
 /var/lib/nova/networks/nova-br662.conf
 Oct 19 02:03:12 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses
 Oct 19 02:03:12 stack-1 dnsmasq[32013]: read
 /var/lib/nova/networks/nova-br662.conf

 ...until I restart things.

 --
 Lars Kellogg-Stedman l...@seas.harvard.edu  |
 Senior Technologist   |
 http://ac.seas.harvard.edu/
 Academic Computing|
 http://code.seas.harvard.edu/
 Harvard School of Engineering |
   and Applied Sciences|


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] dnsmasq stops talking to instances?

2012-10-19 Thread Joe Topjian
Hi Lars,

There are no errors being logged by dnsmasq; started just after 2AM,
 all of the DHCPREQUEST ... traffic just stops, and the logs after
 that point look like this:


We ran into similar issues that turned out to be a qemu bug:

https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978

The fixed qemu-kvm package is now in the main Ubuntu repository (assuming
you are using Ubuntu 12.04), so maybe upgrading it will resolve your issue
quickly. Note that upgrading it does not affect currently running instances
(and subsequently means only newly launched instances will be fixed).


 Oct 19 02:02:34 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses
 Oct 19 02:02:34 stack-1 dnsmasq[32013]: read
 /var/lib/nova/networks/nova-br662.conf
 Oct 19 02:02:35 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses
 Oct 19 02:02:35 stack-1 dnsmasq[32013]: read
 /var/lib/nova/networks/nova-br662.conf
 Oct 19 02:03:12 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses
 Oct 19 02:03:12 stack-1 dnsmasq[32013]: read
 /var/lib/nova/networks/nova-br662.conf

 ...until I restart things.


For us, this included restarting the instance since it lost its IP address
from lack of DHCP traffic. While we were troubleshooting the issue, we
ended up adding a local account to each instance so we can log into the vnc
console and restart networking services.

Thanks,
Joe


-- 
Joe Topjian
Systems Administrator
Cybera Inc.

www.cybera.ca

Cybera is a not-for-profit organization that works to spur and support
innovation, for the economic benefit of Alberta, through the use
of cyberinfrastructure.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] dnsmasq stops talking to instances?

2012-10-18 Thread Lars Kellogg-Stedman
The good news is that since replacing qpid with rabbitmq our
environment seems to have stabilized to the point that it's *almost*
useful.

The last remaining issue is that dnsmasq will occasionally stop
responding to instances.  Killing dnsmasq and restarting
openstack-nova-network makes things work again, but I haven't been
able to figure out why dnsmasq stops responding in the first place.

Has anyone seen this behavior before?  Any pointers would be greatly
appreciated.

-- 
Lars Kellogg-Stedman l...@seas.harvard.edu  |
Senior Technologist   | http://ac.seas.harvard.edu/
Academic Computing| http://code.seas.harvard.edu/
Harvard School of Engineering |
  and Applied Sciences|


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp