** Changed in: neutron
Status: In Progress => Opinion
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1806390
Title:
[RFE] Distributed DHCP agent
Status in neutron:
Opinion
Bug description:
It was very old issue and ended with invalid feature though, I could
not find ideal solution so that I raise this issue again. I wonder how
other think of it.
It's heavily related to the old issue
(https://bugs.launchpad.net/neutron/+bug/1468236), and I reconstruct
the issue from my understanding.
Problems
- With giant shared provider network which has over than 10000 ports in a
network.
- Several DHCP agents for the network. Even per hypervisor for Calico project.
- Scalability issue (DHCP lease file is not updated after the VM started)
occurs.
Solutions from the reporter
1. Add distributed flag for the DHCP agent. And provision DHCP agent on every
compute node.
2. Change DHCP agent notifier to specify DHCP agent per hosts
3. Do not spread DHCP flow outside of local hypervisor.
Conclusion
- Rejected because
- Solution step (2) add big complexity to agent notifier RPC.
- (3) is not a general solution.
- Even worse for migration. There were many side effects to we have to care
about.
- There were building blocks that we can achieve the purpose. (It was
mentioned on IRC, but I still does not understand what the building block that
mentioned is.)
Our private cluster is very much like the Calico. We have an giant
provider network and make them routable with quagga and there were
DHCP agents per compute node. I believe that community has formed some
consensus that this kind of architecture is pretty good at handling
scale issues to see the approach like Routed network.
And to achieve the architecture with the lack of L2, modifying DHCP
agent could not be avoided since its default HA behavior make critical
DB performance issues.
But at the same time, I absolutely agreed with the comment which care
about the unnecessary complexity for distributed approach like DVR.
So What I suggest is
- Do not modify current DHCP agent behaviors like notifier side API. It does
not harm migration logic.
- Do not change the DHCP HA concept and L2 agent at all.
- Just add a distributed flag for DHCP agent. And add host filtering logic
the handler side RPC (get_active_network_info, get_network_info) only when the
DHCP agent is distributed.
- Operators have little bit new concept of distributed DHCP which the agent
is only for ports within a local hypervisor.
Then we can achieve from the change
- Reduce the performance overhead. I found the performance penalty is related
to DB side (getting ports with get_active_info(), and complete provisioning
step with dhcp_ready_on_ports(). RPC fanout is minor.
- Make new concept which means DHCP agent failure domain is splitted.
Any comments are appreciated.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1806390/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp