Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER
Hey, I seem to have the same issue with our VMs, I commented (comment #7) on a bug report that seems to correspond with our DHCP issues: https://bugs.launchpad.net/nova/+bug/887162 Please report if you are still affected by this issue on the bug page so the developers can look into a fix. Regards, Op zaterdag 16 juni 2012, om 01:19 heeft Christian Parpart het volgende geschreven: Hey all, it now just happened twice again, both just today. and the last at 22:00 UTC, with the following in the nova-network's syslog: root@gw1:/var/log# grep 'dnsmasq.*10889' daemon.log Jun 15 17:39:32 cesar1 dnsmasq[10889]: started, version v2.62-7-g4ce4f37 cachesize 150 Jun 15 17:39:32 cesar1 dnsmasq[10889]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: DHCP, static leases only on 10.10.40.3, lease time 3d Jun 15 17:39:32 cesar1 dnsmasq[10889]: reading /etc/resolv.conf Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 4.2.2.1#53 Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 178.63.26.173#53 Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.122#53 Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.121#53 Jun 15 17:39:32 cesar1 dnsmasq[10889]: read /etc/hosts - 519 addresses Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: read /var/lib/nova/networks/nova-br100.conf Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPREQUEST(br100) 10.10.40.16 fa:16:3e:3d:ff:f3 Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPACK(br100) 10.10.40.16 fa:16:3e:3d:ff:f3 redis-appdata1 it seemed that this once VM was the only one who sent a dhcp request over the past 5 hours, and that first wone got replied with dhcp ack, and that is it. That's been the time the host behind that IP (redis-appdata1) stopped functioning. However, I now actually did update dnsmasq on our gateway note, to latest trunk of dnsmasq git repository, killed dnsmasq, restarted nova-network (which auto-starts dnsmasq per device). Now, I really hoped that this one particular bug fix was the cause of the downtime, but appearently, thet MIGHT be another factor. There is unfortunately nothing to read in the VM's syslog. What else could cause the VM to forget its IP? Can this also be caused by send_arp_for_ha=True? Regards, Christian. Christian. On Fri, Jun 15, 2012 at 2:50 AM, Nathanael Burton nathanael.i.bur...@gmail.com (mailto:nathanael.i.bur...@gmail.com) wrote: FWIW I haven't run across the dnsmasq bug in our environment using EPEL packages. Nate On Jun 14, 2012 7:20 PM, Vishvananda Ishaya vishvana...@gmail.com (mailto:vishvana...@gmail.com) wrote: Are you running in VLAN mode? If so, you probably need to update to a new version of dnsmasq. See this message for reference: http://osdir.com/ml/openstack-cloud-computing/2012-05/msg00785.html Vish On Jun 14, 2012, at 1:41 PM, Christian Parpart wrote: Hey all, I feel really sad with saying this, now, that we have quite a few instances in producgtion since about 5 days at least, I now have encountered the second instance loosing its IP address due to No DHCPOFFER (as of syslog in the instance). I checked the logs in the central nova-network and gateway node and found dnsmasq still to reply on requests from all the other instances and it even got the request from the instance in question and even sent an OFFER, as of what I can tell by now (i'm investigating / posting logs asap), but while it seemed that the dnsmasq sends an offer, the instances says it didn't receive one - wtf? Please tell me what I can do to actually *fix* this issue, since this is by far very fatal. One chance I'd see (as a workaround) is, to let created instanced retrieve its IP via dhcp, but then reconfigure /etc/network/instances to continue with static networking setup. However, I'd just like the dhcp thingy to get fixed. I'm very open to any kind of helping comments, :) So long, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net (mailto:openstack@lists.launchpad.net) Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net (mailto:openstack@lists.launchpad.net) Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net
Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER
Hey all, it now just happened twice again, both just today. and the last at 22:00 UTC, with the following in the nova-network's syslog: root@gw1:/var/log# grep 'dnsmasq.*10889' daemon.log Jun 15 17:39:32 cesar1 dnsmasq[10889]: started, version v2.62-7-g4ce4f37 cachesize 150 Jun 15 17:39:32 cesar1 dnsmasq[10889]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: DHCP, static leases only on 10.10.40.3, lease time 3d Jun 15 17:39:32 cesar1 dnsmasq[10889]: reading /etc/resolv.conf Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 4.2.2.1#53 Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 178.63.26.173#53 Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.122#53 Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.121#53 Jun 15 17:39:32 cesar1 dnsmasq[10889]: read /etc/hosts - 519 addresses Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: read /var/lib/nova/networks/nova-br100.conf Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPREQUEST(br100) 10.10.40.16 fa:16:3e:3d:ff:f3 Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPACK(br100) 10.10.40.16 fa:16:3e:3d:ff:f3 redis-appdata1 it seemed that this once VM was the only one who sent a dhcp request over the past 5 hours, and that first wone got replied with dhcp ack, and that is it. That's been the time the host behind that IP (redis-appdata1) stopped functioning. However, I now actually did update dnsmasq on our gateway note, to latest trunk of dnsmasq git repository, killed dnsmasq, restarted nova-network (which auto-starts dnsmasq per device). Now, I really hoped that this one particular bug fix was the cause of the downtime, but appearently, thet MIGHT be another factor. There is unfortunately nothing to read in the VM's syslog. What else could cause the VM to forget its IP? Can this also be caused by send_arp_for_ha=True? Regards, Christian. Christian. On Fri, Jun 15, 2012 at 2:50 AM, Nathanael Burton nathanael.i.bur...@gmail.com wrote: FWIW I haven't run across the dnsmasq bug in our environment using EPEL packages. Nate On Jun 14, 2012 7:20 PM, Vishvananda Ishaya vishvana...@gmail.com wrote: Are you running in VLAN mode? If so, you probably need to update to a new version of dnsmasq. See this message for reference: http://osdir.com/ml/openstack-cloud-computing/2012-05/msg00785.html Vish On Jun 14, 2012, at 1:41 PM, Christian Parpart wrote: Hey all, I feel really sad with saying this, now, that we have quite a few instances in producgtion since about 5 days at least, I now have encountered the second instance loosing its IP address due to No DHCPOFFER (as of syslog in the instance). I checked the logs in the central nova-network and gateway node and found dnsmasq still to reply on requests from all the other instances and it even got the request from the instance in question and even sent an OFFER, as of what I can tell by now (i'm investigating / posting logs asap), but while it seemed that the dnsmasq sends an offer, the instances says it didn't receive one - wtf? Please tell me what I can do to actually *fix* this issue, since this is by far very fatal. One chance I'd see (as a workaround) is, to let created instanced retrieve its IP via dhcp, but then reconfigure /etc/network/instances to continue with static networking setup. However, I'd just like the dhcp thingy to get fixed. I'm very open to any kind of helping comments, :) So long, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] instances loosing IP address while running, due to No DHCPOFFER
Hey all, I feel really sad with saying this, now, that we have quite a few instances in producgtion since about 5 days at least, I now have encountered the second instance loosing its IP address due to No DHCPOFFER (as of syslog in the instance). I checked the logs in the central nova-network and gateway node and found dnsmasq still to reply on requests from all the other instances and it even got the request from the instance in question and even sent an OFFER, as of what I can tell by now (i'm investigating / posting logs asap), but while it seemed that the dnsmasq sends an offer, the instances says it didn't receive one - wtf? Please tell me what I can do to actually *fix* this issue, since this is by far very fatal. One chance I'd see (as a workaround) is, to let created instanced retrieve its IP via dhcp, but then reconfigure /etc/network/instances to continue with static networking setup. However, I'd just like the dhcp thingy to get fixed. I'm very open to any kind of helping comments, :) So long, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER
Has nova-network been restarted? There was an issue where nova-network was signalling dnsmasq which would cause dnsmasq to stop responding to requests yet appear to be running fine. You can see if killing dnsmasq, restarting nova-network, and rebooting an instance allows it to get a dhcp address again ... Nate On Jun 14, 2012 4:46 PM, Christian Parpart tra...@gmail.com wrote: Hey all, I feel really sad with saying this, now, that we have quite a few instances in producgtion since about 5 days at least, I now have encountered the second instance loosing its IP address due to No DHCPOFFER (as of syslog in the instance). I checked the logs in the central nova-network and gateway node and found dnsmasq still to reply on requests from all the other instances and it even got the request from the instance in question and even sent an OFFER, as of what I can tell by now (i'm investigating / posting logs asap), but while it seemed that the dnsmasq sends an offer, the instances says it didn't receive one - wtf? Please tell me what I can do to actually *fix* this issue, since this is by far very fatal. One chance I'd see (as a workaround) is, to let created instanced retrieve its IP via dhcp, but then reconfigure /etc/network/instances to continue with static networking setup. However, I'd just like the dhcp thingy to get fixed. I'm very open to any kind of helping comments, :) So long, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER
Hey, thanks for your reply. Unfortunately there was no process restart in nova-network nor in dnsmasq, both processes seem to have been up for about 2 and 3 days. However, why is the default dhcp_lease_time value equal 120s? Not having this one overridden causes the clients to actually re-acquire a new DHCP lease every 42 seconds (at least on my nodes), which is completely ridiculous. OTOH, I took a look at the sources (linux_net.py) and found out, why the max_lease_time is set to 2048, because that is the size of my network. So why is the max lease time the size of my network? I've written a tiny patch to allow overriding this value in nova.conf, and will submit it to launchpad soon - and hope it'll be accepted and then also applied to essex, since this is a very straight forward few-liner helpful thing. Nevertheless, that does not clarify on why now I had 2 (well, 3 actually) instances getting no DHCP replies/offers after some hours/days anymore. The one host that caused issues today (a few hours ago), I fixed it by hard rebooting the instance, however, just about 40 minutes later, it again forgot its IP, so one might say, that it maybe did not get any reply from the dhcp server (dnsmasq) almost right after it got a lease on instance boot. So long, Christian. On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton nathanael.i.bur...@gmail.com wrote: Has nova-network been restarted? There was an issue where nova-network was signalling dnsmasq which would cause dnsmasq to stop responding to requests yet appear to be running fine. You can see if killing dnsmasq, restarting nova-network, and rebooting an instance allows it to get a dhcp address again ... Nate On Jun 14, 2012 4:46 PM, Christian Parpart tra...@gmail.com wrote: Hey all, I feel really sad with saying this, now, that we have quite a few instances in producgtion since about 5 days at least, I now have encountered the second instance loosing its IP address due to No DHCPOFFER (as of syslog in the instance). I checked the logs in the central nova-network and gateway node and found dnsmasq still to reply on requests from all the other instances and it even got the request from the instance in question and even sent an OFFER, as of what I can tell by now (i'm investigating / posting logs asap), but while it seemed that the dnsmasq sends an offer, the instances says it didn't receive one - wtf? Please tell me what I can do to actually *fix* this issue, since this is by far very fatal. One chance I'd see (as a workaround) is, to let created instanced retrieve its IP via dhcp, but then reconfigure /etc/network/instances to continue with static networking setup. However, I'd just like the dhcp thingy to get fixed. I'm very open to any kind of helping comments, :) So long, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER
Are you running in VLAN mode? If so, you probably need to update to a new version of dnsmasq. See this message for reference: http://osdir.com/ml/openstack-cloud-computing/2012-05/msg00785.html Vish On Jun 14, 2012, at 1:41 PM, Christian Parpart wrote: Hey all, I feel really sad with saying this, now, that we have quite a few instances in producgtion since about 5 days at least, I now have encountered the second instance loosing its IP address due to No DHCPOFFER (as of syslog in the instance). I checked the logs in the central nova-network and gateway node and found dnsmasq still to reply on requests from all the other instances and it even got the request from the instance in question and even sent an OFFER, as of what I can tell by now (i'm investigating / posting logs asap), but while it seemed that the dnsmasq sends an offer, the instances says it didn't receive one - wtf? Please tell me what I can do to actually *fix* this issue, since this is by far very fatal. One chance I'd see (as a workaround) is, to let created instanced retrieve its IP via dhcp, but then reconfigure /etc/network/instances to continue with static networking setup. However, I'd just like the dhcp thingy to get fixed. I'm very open to any kind of helping comments, :) So long, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER
There's a flag 'dhcp_lease_time' (in secs) that can be set in nova.conf. DHCP clients typically re-up every (dhcp_lease_time/2) seconds, but this varies based on client. Additionally some dhcp clients are not persistent, meaning if there's ever a network hiccup and they don't get a dhcp ACK they will give up and stop checking in, thus losing their lease and fall off the network. On RHEL/CentOS/Fedora this is fixed by setting PERSISTENT_DHCLIENT=1 in your ifcfg-eth0 file. Not sure about Ubuntu. Nate On Jun 14, 2012 7:02 PM, Christian Parpart tra...@gmail.com wrote: Hey, thanks for your reply. Unfortunately there was no process restart in nova-network nor in dnsmasq, both processes seem to have been up for about 2 and 3 days. However, why is the default dhcp_lease_time value equal 120s? Not having this one overridden causes the clients to actually re-acquire a new DHCP lease every 42 seconds (at least on my nodes), which is completely ridiculous. OTOH, I took a look at the sources (linux_net.py) and found out, why the max_lease_time is set to 2048, because that is the size of my network. So why is the max lease time the size of my network? I've written a tiny patch to allow overriding this value in nova.conf, and will submit it to launchpad soon - and hope it'll be accepted and then also applied to essex, since this is a very straight forward few-liner helpful thing. Nevertheless, that does not clarify on why now I had 2 (well, 3 actually) instances getting no DHCP replies/offers after some hours/days anymore. The one host that caused issues today (a few hours ago), I fixed it by hard rebooting the instance, however, just about 40 minutes later, it again forgot its IP, so one might say, that it maybe did not get any reply from the dhcp server (dnsmasq) almost right after it got a lease on instance boot. So long, Christian. On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton nathanael.i.bur...@gmail.com wrote: Has nova-network been restarted? There was an issue where nova-network was signalling dnsmasq which would cause dnsmasq to stop responding to requests yet appear to be running fine. You can see if killing dnsmasq, restarting nova-network, and rebooting an instance allows it to get a dhcp address again ... Nate On Jun 14, 2012 4:46 PM, Christian Parpart tra...@gmail.com wrote: Hey all, I feel really sad with saying this, now, that we have quite a few instances in producgtion since about 5 days at least, I now have encountered the second instance loosing its IP address due to No DHCPOFFER (as of syslog in the instance). I checked the logs in the central nova-network and gateway node and found dnsmasq still to reply on requests from all the other instances and it even got the request from the instance in question and even sent an OFFER, as of what I can tell by now (i'm investigating / posting logs asap), but while it seemed that the dnsmasq sends an offer, the instances says it didn't receive one - wtf? Please tell me what I can do to actually *fix* this issue, since this is by far very fatal. One chance I'd see (as a workaround) is, to let created instanced retrieve its IP via dhcp, but then reconfigure /etc/network/instances to continue with static networking setup. However, I'd just like the dhcp thingy to get fixed. I'm very open to any kind of helping comments, :) So long, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER
I vaguely recall Vish mentioning a bug in dnsmasq that had a somewhat similar problem. (it had to do with lease renewal problems on ip aliases or something like that). This issue was particularly pronounced with windows VMs, apparently. -nld On Thu, Jun 14, 2012 at 6:02 PM, Christian Parpart tra...@gmail.com wrote: Hey, thanks for your reply. Unfortunately there was no process restart in nova-network nor in dnsmasq, both processes seem to have been up for about 2 and 3 days. However, why is the default dhcp_lease_time value equal 120s? Not having this one overridden causes the clients to actually re-acquire a new DHCP lease every 42 seconds (at least on my nodes), which is completely ridiculous. OTOH, I took a look at the sources (linux_net.py) and found out, why the max_lease_time is set to 2048, because that is the size of my network. So why is the max lease time the size of my network? I've written a tiny patch to allow overriding this value in nova.conf, and will submit it to launchpad soon - and hope it'll be accepted and then also applied to essex, since this is a very straight forward few-liner helpful thing. Nevertheless, that does not clarify on why now I had 2 (well, 3 actually) instances getting no DHCP replies/offers after some hours/days anymore. The one host that caused issues today (a few hours ago), I fixed it by hard rebooting the instance, however, just about 40 minutes later, it again forgot its IP, so one might say, that it maybe did not get any reply from the dhcp server (dnsmasq) almost right after it got a lease on instance boot. So long, Christian. On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton nathanael.i.bur...@gmail.com wrote: Has nova-network been restarted? There was an issue where nova-network was signalling dnsmasq which would cause dnsmasq to stop responding to requests yet appear to be running fine. You can see if killing dnsmasq, restarting nova-network, and rebooting an instance allows it to get a dhcp address again ... Nate On Jun 14, 2012 4:46 PM, Christian Parpart tra...@gmail.com wrote: Hey all, I feel really sad with saying this, now, that we have quite a few instances in producgtion since about 5 days at least, I now have encountered the second instance loosing its IP address due to No DHCPOFFER (as of syslog in the instance). I checked the logs in the central nova-network and gateway node and found dnsmasq still to reply on requests from all the other instances and it even got the request from the instance in question and even sent an OFFER, as of what I can tell by now (i'm investigating / posting logs asap), but while it seemed that the dnsmasq sends an offer, the instances says it didn't receive one - wtf? Please tell me what I can do to actually *fix* this issue, since this is by far very fatal. One chance I'd see (as a workaround) is, to let created instanced retrieve its IP via dhcp, but then reconfigure /etc/network/instances to continue with static networking setup. However, I'd just like the dhcp thingy to get fixed. I'm very open to any kind of helping comments, :) So long, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER
Hey all, many many thanks for all your replies, and while already having raised the dhcp timeouts just by now, I'll have now enough time to sleep to actually apply the dnsmasq fix tomorrow then. Yes, I am running in VLAN-mode, since this is also the propagated way. Maybe OpenStack (nova-network) should check the version number of dnsmasq and if running in vlan mode, it really should issue a (critical) warning into the logs, especially where this kind of error can lead to disasters in datacenters. :) I also hope that Ubuntu 12.04 will pick up this patch soon enough, so the us won't end up in a patch-dominated distribution :-) Good night all, Christian. On Fri, Jun 15, 2012 at 1:16 AM, Narayan Desai narayan.de...@gmail.comwrote: I vaguely recall Vish mentioning a bug in dnsmasq that had a somewhat similar problem. (it had to do with lease renewal problems on ip aliases or something like that). This issue was particularly pronounced with windows VMs, apparently. -nld On Thu, Jun 14, 2012 at 6:02 PM, Christian Parpart tra...@gmail.com wrote: Hey, thanks for your reply. Unfortunately there was no process restart in nova-network nor in dnsmasq, both processes seem to have been up for about 2 and 3 days. However, why is the default dhcp_lease_time value equal 120s? Not having this one overridden causes the clients to actually re-acquire a new DHCP lease every 42 seconds (at least on my nodes), which is completely ridiculous. OTOH, I took a look at the sources (linux_net.py) and found out, why the max_lease_time is set to 2048, because that is the size of my network. So why is the max lease time the size of my network? I've written a tiny patch to allow overriding this value in nova.conf, and will submit it to launchpad soon - and hope it'll be accepted and then also applied to essex, since this is a very straight forward few-liner helpful thing. Nevertheless, that does not clarify on why now I had 2 (well, 3 actually) instances getting no DHCP replies/offers after some hours/days anymore. The one host that caused issues today (a few hours ago), I fixed it by hard rebooting the instance, however, just about 40 minutes later, it again forgot its IP, so one might say, that it maybe did not get any reply from the dhcp server (dnsmasq) almost right after it got a lease on instance boot. So long, Christian. On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton nathanael.i.bur...@gmail.com wrote: Has nova-network been restarted? There was an issue where nova-network was signalling dnsmasq which would cause dnsmasq to stop responding to requests yet appear to be running fine. You can see if killing dnsmasq, restarting nova-network, and rebooting an instance allows it to get a dhcp address again ... Nate On Jun 14, 2012 4:46 PM, Christian Parpart tra...@gmail.com wrote: Hey all, I feel really sad with saying this, now, that we have quite a few instances in producgtion since about 5 days at least, I now have encountered the second instance loosing its IP address due to No DHCPOFFER (as of syslog in the instance). I checked the logs in the central nova-network and gateway node and found dnsmasq still to reply on requests from all the other instances and it even got the request from the instance in question and even sent an OFFER, as of what I can tell by now (i'm investigating / posting logs asap), but while it seemed that the dnsmasq sends an offer, the instances says it didn't receive one - wtf? Please tell me what I can do to actually *fix* this issue, since this is by far very fatal. One chance I'd see (as a workaround) is, to let created instanced retrieve its IP via dhcp, but then reconfigure /etc/network/instances to continue with static networking setup. However, I'd just like the dhcp thingy to get fixed. I'm very open to any kind of helping comments, :) So long, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER
FWIW I haven't run across the dnsmasq bug in our environment using EPEL packages. Nate On Jun 14, 2012 7:20 PM, Vishvananda Ishaya vishvana...@gmail.com wrote: Are you running in VLAN mode? If so, you probably need to update to a new version of dnsmasq. See this message for reference: http://osdir.com/ml/openstack-cloud-computing/2012-05/msg00785.html Vish On Jun 14, 2012, at 1:41 PM, Christian Parpart wrote: Hey all, I feel really sad with saying this, now, that we have quite a few instances in producgtion since about 5 days at least, I now have encountered the second instance loosing its IP address due to No DHCPOFFER (as of syslog in the instance). I checked the logs in the central nova-network and gateway node and found dnsmasq still to reply on requests from all the other instances and it even got the request from the instance in question and even sent an OFFER, as of what I can tell by now (i'm investigating / posting logs asap), but while it seemed that the dnsmasq sends an offer, the instances says it didn't receive one - wtf? Please tell me what I can do to actually *fix* this issue, since this is by far very fatal. One chance I'd see (as a workaround) is, to let created instanced retrieve its IP via dhcp, but then reconfigure /etc/network/instances to continue with static networking setup. However, I'd just like the dhcp thingy to get fixed. I'm very open to any kind of helping comments, :) So long, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp