Hi Garyk, Salvatore, Mark and Dan,
Thanks for the spec reviews,
I have modified the spec and
https://review.openstack.org/#/c/15619/4 is there in consistency with
the spec.
I think the main concerns are:
1. created_at and updated_at fields.
2. cache on server
The new design and implementation does not have these two stuff.
3. for router distribution among multiple l3 agents, we will do it in
quantum scheduler and multiple hosts and agents features.
In scheduler, we will enable agents to report their state to quantum
server, then quantum server can schedule routers, dhcp stuff to related
agents.
Thanks
Yong Sheng Gong
On 11/14/2012 09:18 AM, Dan Wendlandt wrote:
Yes, realistically, the blueprint should have said "avoid expensive
polling when using l3-agent". During Folsom, the way we did this for
the L2-agents was introducing an RPC-layer, so that is what we named
this blueprint, but in hindsight that is specifying the mechanism, not
the goal.
Dan
On Tue, Nov 13, 2012 at 2:28 PM, Mark McClain
<[email protected] <mailto:[email protected]>> wrote:
Sorry for the delay responding. I wanted to read through the
proposed review which was updated overnight.
I think the current direction of the code is starting to head in
the right direction, but I don't think it goes far enough. I was
thinking about the problem on my run today and realized that part
of the issue might be the blueprint description. The blueprint
summary and title say to convert to RPC, but in reality all that
is needed is a combination of notifications (which are already
emitted by Quantum) and targeted API calls. Adding RPC actually
increases the complexity and number of changes and essentially
duplicates notification functionality.
At DreamHost, we we built our own L3 router service based off
Quantum's notifications and http api. Using both, we were able to
keep the existing agent/server relationship and improve the
communication efficiency by only making requests when needed.
Another benefit to this design was that we were able to keep the
number of files changed to a minimum: one (l3_agent.py).
mark
PS: I'm working get that code open sourced, so folks can take a look.
On Nov 13, 2012, at 7:54 AM, Gary Kotton <[email protected]
<mailto:[email protected]>> wrote:
Hi,
I too have added some comments to the document.
Thanks
Gary
On 11/13/2012 12:06 PM, Salvatore Orlando wrote:
Hi Yong,
I added some more comments on the google document.
I don't think this design is bad. Still, I believe we can smooth
some details in order to keep the efficiency improvement that
you are achieving with a lesser impact on the plugin.
I also have some more comments inline.
Thanks,
Salvatore
On 12 November 2012 23:50, gong yong sheng
<[email protected] <mailto:[email protected]>>
wrote:
Hi salv-orlando and markmcclain,
There is no email back for a long time since I sent out the
spec, So I had not paid attention to the spec for a while.
I do apologise for that. However, as you know, when this happens
it's not because we're deliberately ignoring the work.
I have replied mark's comments.
I have to say, this design is of high efficiency.
I agree that the goal will be to increase efficiency of the
interface between the plugin and the agent.
1. l3 agent does not bug server many times within a sync cycle.
Agreed, but I suspect we'll be paying an undesired price in
terms of scalability of the server side component.
I have some comments on the google document and a suggestion for
an alternative, which I'm pretty sure you've already considered.
So it's up to you telling me why the alternative would be worse
than the solution you're proposing :)
2. We use adjustable periodical time to sync data so that
even if administrator operates routers' data in a frequent
manner,
the system's behaviour can be expected. There is no
notification and data exchange between l3 agent and quantum
server for every
router operation. This will create latency between router
update and putting into operation. Of course, we can modify
the algorithm so that
l3 agent will sync data after each router and its related
data modified. ( including created, deleted, updated)
I found interesting that you are seeing periodic sync as a
better approach compared to notifications. I agree the period of
synchronization is tuneable, and expert deployers will be able
to analyse their traffic pattern and find the optimal sync
period. Still, notifications is a widely and successfully used
mechanism in a wide set of apps. So I'm curious to hear why you
think they might not be as good as periodic syn in our case.
interface
the interface between quantum server and l3 agent is simple:
l3 agent -> quantum server:
sync_routers(host, synctype)
synctype is full sync and incremental sync.
first time is full sync and then we will use incremental
sync for normal operations.
If sync
quantum server -> l3 agent
router_deleted(router_id)
Are you explicitly using notifications for delete events in
order to avoid the need for soft deletes?
From what I gather the sync mechanism is not able to cope with
object deletions.
Soft deletes are actually my biggest concern. Is this the only
kind of notification you're looking at?
Data structure on server side:
mapper for l3 agents' sync object:
quantum server keeps a mapper for sync objects of agents:
sync object is just keeping last sync time
to deal with quantum server restart:
quantum server will start full sync for coming sync to re
build the cache.
I don't understand this bit. Will the quantum server a
notification to all agents inviting them to do a full sync?
to deal with l3 agent restarts:
l3 agent will use full sync to replace the sync object on
the server side.
This is pretty much clear.
big router concept
on server side, we have a concept of a big router: include
router, its gateway port, its interfaces and related
floating ips.
one sync will sync all of these data by one shot from server
to l3 agent.
with multi-host and multi-l3 agents coming, we will be able
to distribute the big routers among l3 agents. so don't
worry about the data size in one sync.
Indeed. But why worry about maintaining the last sync state for
a lot of agents? I know your answer would be that it's just a
data structure which maps an agent id to a timestamp, and it's a
good argument. But we'll also have increased state because of
the added fields, and increased computation logic as you'll need
to scan all objects for veryfing whether more have been
created/updated since last sync, and the number of those object
can grow quite a lot.
patches are:
Add created_at and updated_at datetime columns.
<https://review.openstack.org/#/c/15476/> I think adding
created_at and updated_at are agreed by many core members,
even if we don't agree the sync way.
l3 agent rpc. (WORKINPROGRESS)
<https://review.openstack.org/#/c/15619/> It is sync
algorithm by now.
Thanks
Yong Sheng Gong
--
Mailing list: https://launchpad.net/~quantum-core
<https://launchpad.net/%7Equantum-core>
Post to : [email protected]
<mailto:[email protected]>
Unsubscribe : https://launchpad.net/~quantum-core
<https://launchpad.net/%7Equantum-core>
More help : https://help.launchpad.net/ListHelp
--
Mailing list: https://launchpad.net/~quantum-core
<https://launchpad.net/%7Equantum-core>
Post to : [email protected]
<mailto:[email protected]>
Unsubscribe : https://launchpad.net/~quantum-core
<https://launchpad.net/%7Equantum-core>
More help : https://help.launchpad.net/ListHelp
--
Mailing list: https://launchpad.net/~quantum-core
<https://launchpad.net/%7Equantum-core>
Post to : [email protected]
<mailto:[email protected]>
Unsubscribe : https://launchpad.net/~quantum-core
<https://launchpad.net/%7Equantum-core>
More help : https://help.launchpad.net/ListHelp
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dan Wendlandt
Nicira, Inc: www.nicira.com <http://www.nicira.com>
twitter: danwendlandt
~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
Mailing list: https://launchpad.net/~quantum-core
Post to : [email protected]
Unsubscribe : https://launchpad.net/~quantum-core
More help : https://help.launchpad.net/ListHelp