Hi Zang, On Wed, Jul 16, 2014 at 4:43 PM, Zang MingJie <zealot0...@gmail.com> wrote: > Hi, all: > > While resolving ovs restart rebuild br-tun flows[1], we have found > several l2pop problems: > > 1. L2pop is depending on agent_boot_time to decide whether send all > port information or not, but the agent_boot_time is unreliable, for > example if the service receives port up message before agent status > report, the agent won't receive any port on other agents forever.
you're right, there a race condition here, if the agent has more than 1 port on the same network and if the agent sends its update_device_up() on every port before it sends its report_state(), it won't receive fdb concerning these network. Is it the race you are mentionning above? Since the report_state is done in a dedicated greenthread, and is launched before the greenthread that manages ovsdb_monitor, the state of the agent should be updated before the agent gets aware of its ports and sends get_device_details()/update_device_up(), am I wrong? So, after a restart of an agent, the agent_uptime() should be less than the agent_boot_time configured by default in the conf when the agent sent its first update_device_up(), the l2pop MD will be aware of this restart and trigger the cast of all fdb entries to the restarted agent. But I agree that it might relies on enventlet thread managment and on agent_boot_time that can be misconfigured by the provider. > 2. If the openvswitch restarted, all flows will be lost, including all > l2pop flows, the agent is unable to fetch or recreate the l2pop flows. > > To resolve the problems, I'm suggesting some changes: > > 1. Because the agent_boot_time is unreliable, the service can't decide > whether to send flooding entry or not. But the agent can build up the > flooding entries from unicast entries, it has already been > implemented[2] > > 2. Create a rpc from agent to service which fetch all fdb entries, the > agent calls the rpc in `provision_local_vlan`, before setting up any > port.[3] > > After these changes, the l2pop service part becomes simpler and more > robust, mainly 2 function: first, returns all fdb entries at once when > requested; second, broadcast fdb single entry when a port is up/down. That's an implementation that we have been thinking about during the l2pop implementation. Our purpose was to minimize RPC calls. But if this implementation is buggy due to uncontrolled thread order and/or bad usage of the agent_boot_time parameter, it's worth investigating your proposal [3]. However, I don't get why [3] depends on [2]. couldn't we have a network_sync() sent by the agent during provision_local_vlan() which will reconfigure ovs when the agent and/or the ovs restart? > [1] https://bugs.launchpad.net/neutron/+bug/1332450 > [2] https://review.openstack.org/#/c/101581/ > [3] https://review.openstack.org/#/c/107409/ > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev