Re: [openstack-dev] [neutron] high dhcp lease times in neutron deployments considered harmful (or not???)

Kevin Benton Wed, 04 Feb 2015 01:02:40 -0800

I proposed an alternative to adjusting the lease time early on the in the
thread. By specifying the renewal time (DHCP option 58), we can have the
benefits of a long lease time (resiliency to long DHCP server outages)
while having a frequent renewal interval to check for IP changes. I favored
this approach because it only required a patch to dnsmasq to allow that
option to be set and patch to our agent to set that option, both of which
are pretty straight-forward.


>- Just don't allow users to change their IPs without a reboot.

How can we do this? Call Nova from Neutron to force a reboot when the port
is updated?

>- Bounce the link under the VM when the IP is changed, to force the guest
to re-request a DHCP lease immediately.

I had thought about this as well and it's the approach that I think would
be ideal, but the Nova VIF code would require changes to add support for
changing interface state. It's definition of "plugging" and "unplugging" is
actually creating and deleting the interfaces, which might not work so well
with running VMs. Then more changes would have to be done on the Nova side
to react to a port IP change notification from Neutron to trigger the
interface bounce. Finally, a small change would have to be made to Neutron
to send the IP change event to Nova.

The amount of changes it required from the Nova side deterred me from
pursuing it further.

>- Remove the IP spoofing firewall feature

I think this makes sense as a tenant-configurable option for networks they
own, but I don't think we should throw it out. It makes for good protection
on networks facing Internet traffic that could have compromised hosts.
Along the same line, we make use of shared networks, which has other shady
tenants that might be dishonest when it comes to IP addresses.

>- Make the IP spoofing firewall allow an overlap of both old and new
addresses until the DHCP lease time is up (or the instance reboots).  Adds
some additional async tasks, but this is clearly the required solution if
we want to keep all our existing features.

I didn't find a clean spot to put this. Spoofing rules are generated a long
ways away from the code that knows about IP updates. Maybe we could tack it
onto the response to the query from the agent for allowed address pairs.
Then we have to deal with persisting these temporary allowed addresses to
the DB (not a big deal, but still a schema change). Another issue here
would be if Neutron then allocated that address for another port while it
was still in use by the old node. We will probably have to block IPAM from
re-allocating that address for another port during this window as well.

However, this doesn't solve the general slowness of DHCP info propagation
for other updates (subnet gateway change, DNS nameserver change, etc), so I
would still like to go forward with the increased renewal interval. I will
also look into eliminating the downtime completely with your last
suggestion if it can be implemented without impacting too much stuff.

On Tue, Feb 3, 2015 at 11:01 PM, Angus Lees <[email protected]> wrote:

> There's clearly not going to be any amount of time that satisfies both
> concerns here.
>
> Just to get some other options on the table, here's some things that would
> allow a non-zero dhcp lease timeout _and_ address Kevin's original bug
> report:
>
> - Just don't allow users to change their IPs without a reboot.
>
> - Bounce the link under the VM when the IP is changed, to force the guest
> to re-request a DHCP lease immediately.
>
> - Remove the IP spoofing firewall feature  (<- my favourite, for what it's
> worth. I've never liked presenting a layer2 abstraction but then forcing
> specific layer3 addressing choices by default)
>
> - Make the IP spoofing firewall allow an overlap of both old and new
> addresses until the DHCP lease time is up (or the instance reboots).  Adds
> some additional async tasks, but this is clearly the required solution if
> we want to keep all our existing features.
>
> On Wed Feb 04 2015 at 4:28:11 PM Aaron Rosen <[email protected]>
> wrote:
>
>> I believe I was the one who changed the default value of this. When we
>> upgraded our internal cloud ~6k networks back then from folsom to grizzly
>> we didn't account that if the dhcp-agents went offline that instances would
>> give up their lease and unconfigure themselves causing an outage. Setting a
>> larger value for this helps to avoid this downtime (as Brian pointed out as
>> well). Personally, I wouldn't really expect my instance to automatically
>> change it's ip  - I think requiring the user to reboot the instance or use
>> the console to correct the ip should be good enough. Especially since this
>> will help buy you shorter down time if an agent fails for a little while
>> which is probably more important than having the instance change it's ip.
>>
>> Aaron
>>
>> On Tue, Feb 3, 2015 at 5:25 PM, Kevin Benton <[email protected]> wrote:
>>
>>> I definitely understand the use-case of having updatable stuff and I
>>> don't intend to support any proposals to strip away that functionality.
>>> Brian was suggesting was to block port IP changes since it depended on DHCP
>>> to deliver that information to the hosts. I was just pointing out that we
>>> would need to block any API operations that resulted in different
>>> information being delivered via DHCP for that approach to make sense.
>>>
>>> On Tue, Feb 3, 2015 at 5:01 PM, Robert Collins <
>>> [email protected]> wrote:
>>>
>>>> On 3 February 2015 at 00:48, Kevin Benton <[email protected]> wrote:
>>>> >>The only thing this discussion has convinced me of is that allowing
>>>> users
>>>> >> to change the fixed IP address on a neutron port leads to a bad
>>>> >> user-experience.
>>>> ...
>>>>
>>>> >>Documenting a VM reboot is necessary, or even deprecating this (you
>>>> won't
>>>> >> like that) are sounding better to me by the minute.
>>>> >
>>>> > If this is an approach you really want to go with, then we should at
>>>> least
>>>> > be consistent and deprecate the extra dhcp options extension (or at
>>>> least
>>>> > the ability to update ports' dhcp options). Updating subnet
>>>> attributes like
>>>> > gateway_ip, dns_nameserves, and host_routes should be thrown out as
>>>> well.
>>>> > All of these things depend on the DHCP server to deliver updated
>>>> information
>>>> > and are hindered by renewal times. Why discriminate against IP
>>>> updates on a
>>>> > port? A failure to receive many of those other types of changes could
>>>> result
>>>> > in just as severe of a connection disruption.
>>>>
>>>> So the reason we added the extra dhcp options extension was to support
>>>> PXE booting physical machines for Nova baremetal, and then Ironic. It
>>>> wasn't added for end users to use on the port, but as a generic way of
>>>> supporting the specific PXE options needed - and that was done that
>>>> way after discussing w/Neutron devs.
>>>>
>>>> We update ports for two reasons. Primarily, Ironic is HA and will move
>>>> the TFTPd that boots are happening from if an Ironic node has failed.
>>>> Secondly, because a non uncommon operation on physical machines is to
>>>> replace broken NICs, and forcing a redeploy seemed unreasonable. The
>>>> former case doesn't affect running nodes since its only consulted on
>>>> reboot. The second case is by definition only possible when the NIC in
>>>> question is offline (whether hotplug hardware or not).
>>>>
>>>> -Rob
>>>>
>>>>
>>>> --
>>>> Robert Collins <[email protected]>
>>>> Distinguished Technologist
>>>> HP Converged Cloud
>>>>
>>>>
>>>> __________________________________________________________________________
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe:
>>>> [email protected]?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>>
>>>
>>> --
>>> Kevin Benton
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> [email protected]?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: [email protected]?subject:
>> unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: [email protected]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 
Kevin Benton

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] high dhcp lease times in neutron deployments considered harmful (or not???)

Reply via email to