On 01/28/2015 09:50 AM, Kevin Benton wrote:
Hi,
Approximately a year and a half ago, the default DHCP lease time in
Neutron was increased from 120 seconds to 86400 seconds.[1] This was
done with the goal of reducing DHCP traffic with very little
discussion (based on what I can see in the review and bug report).
While it it does indeed reduce DHCP traffic, I don't think any bug
reports were filed showing that a 120 second lease time resulted in
too much traffic or that a jump all of the way to 86400 seconds was
required instead of a value in the same order of magnitude.
I guess that would be a good case for FORCERENEW DHCP extension [1]
though after digging thru dnsmasq code a bit, I doubt it supports the
extension (though e.g. systemd dhcp client/server from networkd module
do). Le sigh.
[1]: https://tools.ietf.org/html/rfc3203
Why does this matter?
Neutron ports can be updated with a new IP address from the same
subnet or another subnet on the same network. The port update will
result in anti-spoofing iptables rule changes that immediately stop
the old IP address from working on the host. This means the host is
unreachable for 0-12 hours based on the current default lease time
without manual intervention[2] (assuming half-lease length DHCP
renewal attempts).
Why is this on the mailing list?
In an attempt to make the VMs usable in a much shorter timeframe
following a Neutron port address change, I submitted a patch to reduce
the default DHCP lease time to 8 minutes.[3] However, this was
upsetting to several people,[4] so it was suggested I bring this
discussion to the mailing list. The following are the high-level
concerns followed by my responses:
* 8 minutes is arbitrary
o Yes, but it's no more arbitrary than 1440 minutes. I picked it
as an interval because it is still 4 times larger than the
last short value, but it still allows VMs to regain
connectivity in <5 minutes in the event their IP is changed.
If someone has a good suggestion for another interval based on
known dnsmasq QPS limits or some other quantitative reason,
please chime in here.
* other datacenters use long lease times
o This is true, but it's not really a valid comparison. In most
regular datacenters, updating a static DHCP lease has no
effect on the data plane so it doesn't matter that the client
doesn't react for hours/days (even with DHCP snooping
enabled). However, in Neutron's case, the security groups are
immediately updated so all traffic using the old address is
blocked.
* dhcp traffic is scary because it's broadcast
o ARP traffic is also broadcast and many clients will expire
entries every 5-10 minutes and re-ARP. L2population may be
used to prevent ARP propagation, so the comparison between
DHCP and ARP isn't always relevant here.
Please reply back with your opinions/anecdotes/data related to short
DHCP lease times.
Cheers
1.
https://github.com/openstack/neutron/commit/d9832282cf656b162c51afdefb830dacab72defe
2. Manual intervention could be an instance reboot, a dhcp client
invocation via the console, or a delayed invocation right before the
update. (all significantly more difficult to script than a simple
update of a port's IP via the API).
3. https://review.openstack.org/#/c/150595/
4. http://i.imgur.com/xtvatkP.jpg
--
Kevin Benton
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev