Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER

2012-06-27 Thread Tom Sante
Hey,

I seem to have the same issue with our VMs, I commented (comment #7) on a bug 
report that seems to correspond with our DHCP issues: 
https://bugs.launchpad.net/nova/+bug/887162

Please report if you are still affected by this issue on the bug page so the 
developers can look into a fix.

Regards,


Op zaterdag 16 juni 2012, om 01:19 heeft Christian Parpart het volgende 
geschreven:

 Hey all,
 
 it now just happened twice again, both just today. and the last at 22:00 UTC, 
 with
 the following in the nova-network's syslog:
 
 root@gw1:/var/log# grep 'dnsmasq.*10889' daemon.log 
 Jun 15 17:39:32 cesar1 dnsmasq[10889]: started, version v2.62-7-g4ce4f37 
 cachesize 150
 Jun 15 17:39:32 cesar1 dnsmasq[10889]: compile time options: IPv6 GNU-getopt 
 no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack
 Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: DHCP, static leases only on 
 10.10.40.3, lease time 3d
 Jun 15 17:39:32 cesar1 dnsmasq[10889]: reading /etc/resolv.conf
 Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 4.2.2.1#53
 Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 178.63.26.173#53
 Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.122#53
 Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.121#53
 Jun 15 17:39:32 cesar1 dnsmasq[10889]: read /etc/hosts - 519 addresses
 Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: read 
 /var/lib/nova/networks/nova-br100.conf
 Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPREQUEST(br100) 10.10.40.16 
 fa:16:3e:3d:ff:f3 
 Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPACK(br100) 10.10.40.16 
 fa:16:3e:3d:ff:f3 redis-appdata1
 
 it seemed that this once VM was the only one who sent a dhcp request over the 
 past 5 hours, 
 and that first wone got replied with dhcp ack, and that is it.
 That's been the time the host behind that IP (redis-appdata1) stopped 
 functioning.
 
 However, I now actually did update dnsmasq on our gateway note, to latest 
 trunk 
 of dnsmasq git repository, killed dnsmasq, restarted nova-network (which 
 auto-starts dnsmasq per 
 device).
 
 Now, I really hoped that this one particular bug fix was the cause of the 
 downtime, 
 but appearently, thet MIGHT be another factor.
 
 There is unfortunately nothing to read in the VM's syslog.
 What else could cause the VM to forget its IP?
 Can this also be caused by send_arp_for_ha=True?
 
 Regards,
 Christian.
 
 Christian.
 On Fri, Jun 15, 2012 at 2:50 AM, Nathanael Burton 
 nathanael.i.bur...@gmail.com (mailto:nathanael.i.bur...@gmail.com) wrote:
  FWIW I haven't run across the dnsmasq bug in our environment using EPEL 
  packages. 
  Nate
  On Jun 14, 2012 7:20 PM, Vishvananda Ishaya vishvana...@gmail.com 
  (mailto:vishvana...@gmail.com) wrote:
   Are you running in VLAN mode? If so, you probably need to update to a new 
   version of dnsmasq. See this message for reference:
   
   http://osdir.com/ml/openstack-cloud-computing/2012-05/msg00785.html 
   
   Vish
   
   On Jun 14, 2012, at 1:41 PM, Christian Parpart wrote:
Hey all,

I feel really sad with saying this, now, that we have quite a few 
instances in producgtion 
since about 5 days at least, I now have encountered the second instance 
loosing its
IP address due to No DHCPOFFER (as of syslog in the instance).

I checked the logs in the central nova-network and gateway node and 
found
dnsmasq still to reply on requests from all the other instances and it 
even
got the request from the instance in question and even sent an OFFER, 
as of what
I can tell by now (i'm investigating / posting logs asap), but while it 
seemed
that the dnsmasq sends an offer, the instances says it didn't receive 
one - wtf?

Please tell me what I can do to actually *fix* this issue, since this 
is by far very fatal.

One chance I'd see (as a workaround) is, to let created instanced 
retrieve 
its IP via dhcp, but then reconfigure /etc/network/instances to 
continue with
static networking setup. However, I'd just like the dhcp thingy to get 
fixed.

I'm very open to any kind of helping comments, :) 

So long,
Christian.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net 
(mailto:openstack@lists.launchpad.net)
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp
   
   
   
   ___
   Mailing list: https://launchpad.net/~openstack
   Post to : openstack@lists.launchpad.net 
   (mailto:openstack@lists.launchpad.net)
   Unsubscribe : https://launchpad.net/~openstack
   More help : https://help.launchpad.net/ListHelp
  
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net 

Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER

2012-06-15 Thread Christian Parpart
Hey all,

it now just happened twice again, both just today. and the last at 22:00
UTC, with
the following in the nova-network's syslog:

root@gw1:/var/log# grep 'dnsmasq.*10889' daemon.log
Jun 15 17:39:32 cesar1 dnsmasq[10889]: started, version v2.62-7-g4ce4f37
cachesize 150
Jun 15 17:39:32 cesar1 dnsmasq[10889]: compile time options: IPv6
GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack
Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: DHCP, static leases only on
10.10.40.3, lease time 3d
Jun 15 17:39:32 cesar1 dnsmasq[10889]: reading /etc/resolv.conf
Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 4.2.2.1#53
Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 178.63.26.173#53
Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.122#53
Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.121#53
Jun 15 17:39:32 cesar1 dnsmasq[10889]: read /etc/hosts - 519 addresses
Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: read
/var/lib/nova/networks/nova-br100.conf
Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPREQUEST(br100) 10.10.40.16
fa:16:3e:3d:ff:f3
Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPACK(br100) 10.10.40.16
fa:16:3e:3d:ff:f3 redis-appdata1

it seemed that this once VM was the only one who sent a dhcp request over
the past 5 hours,
and that first wone got replied with dhcp ack, and that is it.
That's been the time the host behind that IP (redis-appdata1) stopped
functioning.

However, I now actually did update dnsmasq on our gateway note, to latest
trunk
of dnsmasq git repository, killed dnsmasq, restarted nova-network (which
auto-starts dnsmasq per
device).

Now, I really hoped that this one particular bug fix was the cause of the
downtime,
but appearently, thet MIGHT be another factor.

There is unfortunately nothing to read in the VM's syslog.
What else could cause the VM to forget its IP?
Can this also be caused by send_arp_for_ha=True?

Regards,
Christian.

Christian.
On Fri, Jun 15, 2012 at 2:50 AM, Nathanael Burton 
nathanael.i.bur...@gmail.com wrote:

 FWIW I haven't run across the dnsmasq bug in our environment using EPEL
 packages.

 Nate
 On Jun 14, 2012 7:20 PM, Vishvananda Ishaya vishvana...@gmail.com
 wrote:

 Are you running in VLAN mode? If so, you probably need to update to a new
 version of dnsmasq.  See this message for reference:

 http://osdir.com/ml/openstack-cloud-computing/2012-05/msg00785.html

 Vish

 On Jun 14, 2012, at 1:41 PM, Christian Parpart wrote:

 Hey all,

 I feel really sad with saying this, now, that we have quite a few
 instances in producgtion
 since about 5 days at least, I now have encountered the second instance
 loosing its
 IP address due to No DHCPOFFER (as of syslog in the instance).

 I checked the logs in the central nova-network and gateway node and found
 dnsmasq still to reply on requests from all the other instances and it
 even
 got the request from the instance in question and even sent an OFFER, as
 of what
 I can tell by now (i'm investigating / posting logs asap), but while it
 seemed
 that the dnsmasq sends an offer, the instances says it didn't receive one
 - wtf?

 Please tell me what I can do to actually *fix* this issue, since this is
 by far very fatal.

 One chance I'd see (as a workaround) is, to let created instanced retrieve
 its IP via dhcp, but then reconfigure /etc/network/instances to continue
 with
 static networking setup. However, I'd just like the dhcp thingy to get
 fixed.

 I'm very open to any kind of helping comments, :)

 So long,
 Christian.

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp



 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] instances loosing IP address while running, due to No DHCPOFFER

2012-06-14 Thread Christian Parpart
Hey all,

I feel really sad with saying this, now, that we have quite a few instances
in producgtion
since about 5 days at least, I now have encountered the second instance
loosing its
IP address due to No DHCPOFFER (as of syslog in the instance).

I checked the logs in the central nova-network and gateway node and found
dnsmasq still to reply on requests from all the other instances and it even
got the request from the instance in question and even sent an OFFER, as of
what
I can tell by now (i'm investigating / posting logs asap), but while it
seemed
that the dnsmasq sends an offer, the instances says it didn't receive one -
wtf?

Please tell me what I can do to actually *fix* this issue, since this is by
far very fatal.

One chance I'd see (as a workaround) is, to let created instanced retrieve
its IP via dhcp, but then reconfigure /etc/network/instances to continue
with
static networking setup. However, I'd just like the dhcp thingy to get
fixed.

I'm very open to any kind of helping comments, :)

So long,
Christian.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER

2012-06-14 Thread Nathanael Burton
Has nova-network been restarted? There was an issue where nova-network was
signalling dnsmasq which would cause dnsmasq to stop responding to requests
yet appear to be running fine.

You can see if killing dnsmasq, restarting nova-network, and rebooting an
instance allows it to get a dhcp address again ...

Nate
On Jun 14, 2012 4:46 PM, Christian Parpart tra...@gmail.com wrote:

 Hey all,

 I feel really sad with saying this, now, that we have quite a few
 instances in producgtion
 since about 5 days at least, I now have encountered the second instance
 loosing its
 IP address due to No DHCPOFFER (as of syslog in the instance).

 I checked the logs in the central nova-network and gateway node and found
 dnsmasq still to reply on requests from all the other instances and it even
 got the request from the instance in question and even sent an OFFER, as
 of what
 I can tell by now (i'm investigating / posting logs asap), but while it
 seemed
 that the dnsmasq sends an offer, the instances says it didn't receive one
 - wtf?

 Please tell me what I can do to actually *fix* this issue, since this is
 by far very fatal.

 One chance I'd see (as a workaround) is, to let created instanced retrieve
 its IP via dhcp, but then reconfigure /etc/network/instances to continue
 with
 static networking setup. However, I'd just like the dhcp thingy to get
 fixed.

 I'm very open to any kind of helping comments, :)

 So long,
 Christian.


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER

2012-06-14 Thread Christian Parpart
Hey,

thanks for your reply. Unfortunately there was no process restart in
nova-network nor in dnsmasq,
both processes seem to have been up for about 2 and 3 days.

However, why is the default dhcp_lease_time value equal 120s? Not having
this one overridden
causes the clients to actually re-acquire a new DHCP lease every 42 seconds
(at least on my nodes),
which is completely ridiculous.
OTOH, I took a look at the sources (linux_net.py) and found out, why the
max_lease_time is
set to 2048, because that is the size of my network.
So why is the max lease time the size of my network?
I've written a tiny patch to allow overriding this value in nova.conf, and
will submit it to launchpad
soon - and hope it'll be accepted and then also applied to essex, since
this is a very straight forward
few-liner helpful thing.

Nevertheless, that does not clarify on why now I had 2 (well, 3 actually)
instances getting
no DHCP replies/offers after some hours/days anymore.

The one host that caused issues today (a few hours ago), I fixed it by hard
rebooting the instance,
however, just about 40 minutes later, it again forgot its IP, so one might
say, that it
maybe did not get any reply from the dhcp server (dnsmasq) almost right
after it got
a lease on instance boot.

So long,
Christian.

On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton 
nathanael.i.bur...@gmail.com wrote:

 Has nova-network been restarted? There was an issue where nova-network was
 signalling dnsmasq which would cause dnsmasq to stop responding to requests
 yet appear to be running fine.

 You can see if killing dnsmasq, restarting nova-network, and rebooting an
 instance allows it to get a dhcp address again ...

 Nate
 On Jun 14, 2012 4:46 PM, Christian Parpart tra...@gmail.com wrote:

 Hey all,

 I feel really sad with saying this, now, that we have quite a few
 instances in producgtion
 since about 5 days at least, I now have encountered the second instance
 loosing its
 IP address due to No DHCPOFFER (as of syslog in the instance).

 I checked the logs in the central nova-network and gateway node and found
 dnsmasq still to reply on requests from all the other instances and it
 even
 got the request from the instance in question and even sent an OFFER, as
 of what
 I can tell by now (i'm investigating / posting logs asap), but while it
 seemed
 that the dnsmasq sends an offer, the instances says it didn't receive one
 - wtf?

 Please tell me what I can do to actually *fix* this issue, since this is
 by far very fatal.

 One chance I'd see (as a workaround) is, to let created instanced retrieve
 its IP via dhcp, but then reconfigure /etc/network/instances to continue
 with
 static networking setup. However, I'd just like the dhcp thingy to get
 fixed.

 I'm very open to any kind of helping comments, :)

 So long,
 Christian.


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER

2012-06-14 Thread Vishvananda Ishaya
Are you running in VLAN mode? If so, you probably need to update to a new 
version of dnsmasq.  See this message for reference:

http://osdir.com/ml/openstack-cloud-computing/2012-05/msg00785.html

Vish

On Jun 14, 2012, at 1:41 PM, Christian Parpart wrote:

 Hey all,
 
 I feel really sad with saying this, now, that we have quite a few instances 
 in producgtion
 since about 5 days at least, I now have encountered the second instance 
 loosing its
 IP address due to No DHCPOFFER (as of syslog in the instance).
 
 I checked the logs in the central nova-network and gateway node and found
 dnsmasq still to reply on requests from all the other instances and it even
 got the request from the instance in question and even sent an OFFER, as of 
 what
 I can tell by now (i'm investigating / posting logs asap), but while it seemed
 that the dnsmasq sends an offer, the instances says it didn't receive one - 
 wtf?
 
 Please tell me what I can do to actually *fix* this issue, since this is by 
 far very fatal.
 
 One chance I'd see (as a workaround) is, to let created instanced retrieve
 its IP via dhcp, but then reconfigure /etc/network/instances to continue with
 static networking setup. However, I'd just like the dhcp thingy to get fixed.
 
 I'm very open to any kind of helping comments, :)
 
 So long,
 Christian.
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER

2012-06-14 Thread Nathanael Burton
There's a flag 'dhcp_lease_time' (in secs) that can be set in nova.conf.
DHCP clients typically re-up every (dhcp_lease_time/2) seconds, but this
varies based on client. Additionally some dhcp clients are not persistent,
meaning if there's ever a network hiccup and they don't get a dhcp ACK they
will give up and stop checking in, thus losing their lease and fall off the
network.

On RHEL/CentOS/Fedora this is fixed by setting PERSISTENT_DHCLIENT=1 in
your ifcfg-eth0 file.  Not sure about Ubuntu.

Nate
On Jun 14, 2012 7:02 PM, Christian Parpart tra...@gmail.com wrote:

 Hey,

 thanks for your reply. Unfortunately there was no process restart in
 nova-network nor in dnsmasq,
 both processes seem to have been up for about 2 and 3 days.

 However, why is the default dhcp_lease_time value equal 120s? Not having
 this one overridden
 causes the clients to actually re-acquire a new DHCP lease every 42
 seconds (at least on my nodes),
 which is completely ridiculous.
 OTOH, I took a look at the sources (linux_net.py) and found out, why the
 max_lease_time is
 set to 2048, because that is the size of my network.
 So why is the max lease time the size of my network?
 I've written a tiny patch to allow overriding this value in nova.conf, and
 will submit it to launchpad
 soon - and hope it'll be accepted and then also applied to essex, since
 this is a very straight forward
 few-liner helpful thing.

 Nevertheless, that does not clarify on why now I had 2 (well, 3 actually)
 instances getting
 no DHCP replies/offers after some hours/days anymore.

 The one host that caused issues today (a few hours ago), I fixed it by
 hard rebooting the instance,
 however, just about 40 minutes later, it again forgot its IP, so one might
 say, that it
 maybe did not get any reply from the dhcp server (dnsmasq) almost right
 after it got
 a lease on instance boot.

 So long,
 Christian.

 On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton 
 nathanael.i.bur...@gmail.com wrote:

 Has nova-network been restarted? There was an issue where nova-network
 was signalling dnsmasq which would cause dnsmasq to stop responding to
 requests yet appear to be running fine.

 You can see if killing dnsmasq, restarting nova-network, and rebooting an
 instance allows it to get a dhcp address again ...

 Nate
 On Jun 14, 2012 4:46 PM, Christian Parpart tra...@gmail.com wrote:

 Hey all,

 I feel really sad with saying this, now, that we have quite a few
 instances in producgtion
 since about 5 days at least, I now have encountered the second instance
 loosing its
 IP address due to No DHCPOFFER (as of syslog in the instance).

 I checked the logs in the central nova-network and gateway node and found
 dnsmasq still to reply on requests from all the other instances and it
 even
 got the request from the instance in question and even sent an OFFER, as
 of what
 I can tell by now (i'm investigating / posting logs asap), but while it
 seemed
 that the dnsmasq sends an offer, the instances says it didn't receive
 one - wtf?

 Please tell me what I can do to actually *fix* this issue, since this is
 by far very fatal.

 One chance I'd see (as a workaround) is, to let created instanced
 retrieve
 its IP via dhcp, but then reconfigure /etc/network/instances to continue
 with
 static networking setup. However, I'd just like the dhcp thingy to get
 fixed.

 I'm very open to any kind of helping comments, :)

 So long,
 Christian.


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER

2012-06-14 Thread Narayan Desai
I vaguely recall Vish mentioning a bug in dnsmasq that had a somewhat
similar problem. (it had to do with lease renewal problems on ip
aliases or something like that).

This issue was particularly pronounced with windows VMs, apparently.
 -nld

On Thu, Jun 14, 2012 at 6:02 PM, Christian Parpart tra...@gmail.com wrote:
 Hey,

 thanks for your reply. Unfortunately there was no process restart in
 nova-network nor in dnsmasq,
 both processes seem to have been up for about 2 and 3 days.

 However, why is the default dhcp_lease_time value equal 120s? Not having
 this one overridden
 causes the clients to actually re-acquire a new DHCP lease every 42 seconds
 (at least on my nodes),
 which is completely ridiculous.
 OTOH, I took a look at the sources (linux_net.py) and found out, why the
 max_lease_time is
 set to 2048, because that is the size of my network.
 So why is the max lease time the size of my network?
 I've written a tiny patch to allow overriding this value in nova.conf, and
 will submit it to launchpad
 soon - and hope it'll be accepted and then also applied to essex, since this
 is a very straight forward
 few-liner helpful thing.

 Nevertheless, that does not clarify on why now I had 2 (well, 3 actually)
 instances getting
 no DHCP replies/offers after some hours/days anymore.

 The one host that caused issues today (a few hours ago), I fixed it by hard
 rebooting the instance,
 however, just about 40 minutes later, it again forgot its IP, so one might
 say, that it
 maybe did not get any reply from the dhcp server (dnsmasq) almost right
 after it got
 a lease on instance boot.

 So long,
 Christian.

 On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton
 nathanael.i.bur...@gmail.com wrote:

 Has nova-network been restarted? There was an issue where nova-network was
 signalling dnsmasq which would cause dnsmasq to stop responding to requests
 yet appear to be running fine.

 You can see if killing dnsmasq, restarting nova-network, and rebooting an
 instance allows it to get a dhcp address again ...

 Nate

 On Jun 14, 2012 4:46 PM, Christian Parpart tra...@gmail.com wrote:

 Hey all,

 I feel really sad with saying this, now, that we have quite a few
 instances in producgtion
 since about 5 days at least, I now have encountered the second instance
 loosing its
 IP address due to No DHCPOFFER (as of syslog in the instance).

 I checked the logs in the central nova-network and gateway node and found
 dnsmasq still to reply on requests from all the other instances and it
 even
 got the request from the instance in question and even sent an OFFER, as
 of what
 I can tell by now (i'm investigating / posting logs asap), but while it
 seemed
 that the dnsmasq sends an offer, the instances says it didn't receive one
 - wtf?

 Please tell me what I can do to actually *fix* this issue, since this is
 by far very fatal.

 One chance I'd see (as a workaround) is, to let created instanced
 retrieve
 its IP via dhcp, but then reconfigure /etc/network/instances to continue
 with
 static networking setup. However, I'd just like the dhcp thingy to get
 fixed.

 I'm very open to any kind of helping comments, :)

 So long,
 Christian.


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to     : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp



 ___
 Mailing list: https://launchpad.net/~openstack
 Post to     : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER

2012-06-14 Thread Christian Parpart
Hey all,

many many thanks for all your replies, and while already having raised the
dhcp timeouts
just by now, I'll have now enough time to sleep to actually apply the
dnsmasq fix
tomorrow then.

Yes, I am running in VLAN-mode, since this is also the propagated way.

Maybe OpenStack (nova-network) should check the version number of dnsmasq
and
if running in vlan mode, it really should issue a (critical) warning into
the logs,
especially where this kind of error can lead to disasters in datacenters. :)

I also hope that Ubuntu 12.04 will pick up this patch soon enough, so the
us won't
end up in a patch-dominated distribution :-)

Good night all,
Christian.

On Fri, Jun 15, 2012 at 1:16 AM, Narayan Desai narayan.de...@gmail.comwrote:

 I vaguely recall Vish mentioning a bug in dnsmasq that had a somewhat
 similar problem. (it had to do with lease renewal problems on ip
 aliases or something like that).

 This issue was particularly pronounced with windows VMs, apparently.
  -nld

 On Thu, Jun 14, 2012 at 6:02 PM, Christian Parpart tra...@gmail.com
 wrote:
  Hey,
 
  thanks for your reply. Unfortunately there was no process restart in
  nova-network nor in dnsmasq,
  both processes seem to have been up for about 2 and 3 days.
 
  However, why is the default dhcp_lease_time value equal 120s? Not having
  this one overridden
  causes the clients to actually re-acquire a new DHCP lease every 42
 seconds
  (at least on my nodes),
  which is completely ridiculous.
  OTOH, I took a look at the sources (linux_net.py) and found out, why the
  max_lease_time is
  set to 2048, because that is the size of my network.
  So why is the max lease time the size of my network?
  I've written a tiny patch to allow overriding this value in nova.conf,
 and
  will submit it to launchpad
  soon - and hope it'll be accepted and then also applied to essex, since
 this
  is a very straight forward
  few-liner helpful thing.
 
  Nevertheless, that does not clarify on why now I had 2 (well, 3 actually)
  instances getting
  no DHCP replies/offers after some hours/days anymore.
 
  The one host that caused issues today (a few hours ago), I fixed it by
 hard
  rebooting the instance,
  however, just about 40 minutes later, it again forgot its IP, so one
 might
  say, that it
  maybe did not get any reply from the dhcp server (dnsmasq) almost right
  after it got
  a lease on instance boot.
 
  So long,
  Christian.
 
  On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton
  nathanael.i.bur...@gmail.com wrote:
 
  Has nova-network been restarted? There was an issue where nova-network
 was
  signalling dnsmasq which would cause dnsmasq to stop responding to
 requests
  yet appear to be running fine.
 
  You can see if killing dnsmasq, restarting nova-network, and rebooting
 an
  instance allows it to get a dhcp address again ...
 
  Nate
 
  On Jun 14, 2012 4:46 PM, Christian Parpart tra...@gmail.com wrote:
 
  Hey all,
 
  I feel really sad with saying this, now, that we have quite a few
  instances in producgtion
  since about 5 days at least, I now have encountered the second instance
  loosing its
  IP address due to No DHCPOFFER (as of syslog in the instance).
 
  I checked the logs in the central nova-network and gateway node and
 found
  dnsmasq still to reply on requests from all the other instances and it
  even
  got the request from the instance in question and even sent an OFFER,
 as
  of what
  I can tell by now (i'm investigating / posting logs asap), but while it
  seemed
  that the dnsmasq sends an offer, the instances says it didn't receive
 one
  - wtf?
 
  Please tell me what I can do to actually *fix* this issue, since this
 is
  by far very fatal.
 
  One chance I'd see (as a workaround) is, to let created instanced
  retrieve
  its IP via dhcp, but then reconfigure /etc/network/instances to
 continue
  with
  static networking setup. However, I'd just like the dhcp thingy to get
  fixed.
 
  I'm very open to any kind of helping comments, :)
 
  So long,
  Christian.
 
 
  ___
  Mailing list: https://launchpad.net/~openstack
  Post to : openstack@lists.launchpad.net
  Unsubscribe : https://launchpad.net/~openstack
  More help   : https://help.launchpad.net/ListHelp
 
 
 
  ___
  Mailing list: https://launchpad.net/~openstack
  Post to : openstack@lists.launchpad.net
  Unsubscribe : https://launchpad.net/~openstack
  More help   : https://help.launchpad.net/ListHelp
 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER

2012-06-14 Thread Nathanael Burton
FWIW I haven't run across the dnsmasq bug in our environment using EPEL
packages.

Nate
On Jun 14, 2012 7:20 PM, Vishvananda Ishaya vishvana...@gmail.com wrote:

 Are you running in VLAN mode? If so, you probably need to update to a new
 version of dnsmasq.  See this message for reference:

 http://osdir.com/ml/openstack-cloud-computing/2012-05/msg00785.html

 Vish

 On Jun 14, 2012, at 1:41 PM, Christian Parpart wrote:

 Hey all,

 I feel really sad with saying this, now, that we have quite a few
 instances in producgtion
 since about 5 days at least, I now have encountered the second instance
 loosing its
 IP address due to No DHCPOFFER (as of syslog in the instance).

 I checked the logs in the central nova-network and gateway node and found
 dnsmasq still to reply on requests from all the other instances and it even
 got the request from the instance in question and even sent an OFFER, as
 of what
 I can tell by now (i'm investigating / posting logs asap), but while it
 seemed
 that the dnsmasq sends an offer, the instances says it didn't receive one
 - wtf?

 Please tell me what I can do to actually *fix* this issue, since this is
 by far very fatal.

 One chance I'd see (as a workaround) is, to let created instanced retrieve
 its IP via dhcp, but then reconfigure /etc/network/instances to continue
 with
 static networking setup. However, I'd just like the dhcp thingy to get
 fixed.

 I'm very open to any kind of helping comments, :)

 So long,
 Christian.

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp



 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp