Hi Garyk, Salvatore, Mark and Dan,

Thanks for the spec reviews,
I have modified the spec and
https://review.openstack.org/#/c/15619/4 is there in consistency with the spec.

I think the main concerns are:
1. created_at and updated_at fields.
2. cache on server
The new design and implementation does not have these two stuff.
3. for router distribution among multiple l3 agents, we will do it in quantum scheduler and multiple hosts and agents features. In scheduler, we will enable agents to report their state to quantum server, then quantum server can schedule routers, dhcp stuff to related agents.

Thanks
Yong Sheng Gong

On 11/14/2012 09:18 AM, Dan Wendlandt wrote:
Yes, realistically, the blueprint should have said "avoid expensive polling when using l3-agent". During Folsom, the way we did this for the L2-agents was introducing an RPC-layer, so that is what we named this blueprint, but in hindsight that is specifying the mechanism, not the goal.

Dan

On Tue, Nov 13, 2012 at 2:28 PM, Mark McClain <[email protected] <mailto:[email protected]>> wrote:

    Sorry for the delay responding.  I wanted to read through the
    proposed review which was updated overnight.

    I think the current direction of the code is starting to head in
    the right direction, but I don't think it goes far enough.  I was
    thinking about the problem on my run today and realized that part
    of the issue might be the blueprint description.  The blueprint
    summary and title say to convert to RPC, but in reality all that
    is needed is a combination of notifications (which are already
    emitted by Quantum) and targeted API calls.  Adding RPC actually
    increases the complexity and number of changes and essentially
    duplicates notification functionality.

    At DreamHost, we we built our own L3 router service based off
    Quantum's notifications and http api.  Using both, we were able to
    keep the existing agent/server relationship and improve the
    communication efficiency by only making requests when needed.
     Another benefit to this design was that we were able to keep the
    number of files changed to a minimum: one (l3_agent.py).

    mark

    PS:  I'm working get that code open sourced, so folks can take a look.

    On Nov 13, 2012, at 7:54 AM, Gary Kotton <[email protected]
    <mailto:[email protected]>> wrote:

    Hi,
    I too have added some comments to the document.
    Thanks
    Gary


    On 11/13/2012 12:06 PM, Salvatore Orlando wrote:
    Hi Yong,

    I added some more comments on the google document.
    I don't think this design is bad. Still, I believe we can smooth
    some details in order to keep the efficiency improvement that
    you are achieving with a lesser impact on the plugin.

    I also have some more comments inline.

    Thanks,
    Salvatore

    On 12 November 2012 23:50, gong yong sheng
    <[email protected] <mailto:[email protected]>>
    wrote:

        Hi salv-orlando and markmcclain,

        There is no email back for a long time since I sent out the
        spec, So I had not paid attention to the spec for a while.


    I do apologise for that. However, as you know, when this happens
    it's not because we're deliberately ignoring the work.

        I have replied mark's comments.

        I have to say, this design is of high efficiency.


    I agree that the goal will be to increase efficiency of the
    interface between the plugin and the agent.

        1. l3 agent does not bug server many times within a sync cycle.


    Agreed, but I suspect we'll be paying an undesired price in
    terms of scalability of the server side component.
    I have some comments on the google document and a suggestion for
    an alternative, which I'm pretty sure you've already considered.
    So it's up to you telling me why the alternative would be worse
    than the solution you're proposing :)

        2. We use adjustable periodical time to sync data so that
        even if administrator operates routers' data in a frequent
        manner,
        the system's behaviour can be expected. There is no
        notification and data exchange between l3 agent and quantum
        server for every
        router operation. This will create latency between router
        update and putting into operation. Of course, we can modify
        the algorithm so that
        l3 agent will sync data after each router and its related
        data modified. ( including created, deleted, updated)


    I found interesting that you are seeing periodic sync as a
    better approach compared to notifications. I agree the period of
    synchronization is tuneable, and expert deployers will be able
    to analyse their traffic pattern and find the optimal sync
    period. Still, notifications is a widely and successfully used
    mechanism in a wide set of apps. So I'm curious to hear why you
    think they might not be as good as periodic syn in our case.


          interface

        the interface between quantum server and l3 agent is simple:
        l3 agent -> quantum server:
        sync_routers(host, synctype)
        synctype is full sync and incremental sync.
        first time is full sync and then we will use incremental
        sync for normal operations.
        If sync
        quantum server -> l3 agent
        router_deleted(router_id)


    Are you explicitly using notifications for delete events in
    order to avoid the need for soft deletes?
    From what I gather the sync mechanism is not able to cope with
    object deletions.
    Soft deletes are actually my biggest concern. Is this the only
    kind of notification you're looking at?


          Data structure on server side:


            mapper for l3 agents' sync object:

        quantum server keeps a mapper for sync objects of agents:
        sync object is just keeping last sync time

        to deal with quantum server restart:
        quantum server will start full sync for coming sync to re
        build the cache.


    I don't understand this bit. Will the quantum server a
    notification to all agents inviting them to do a full sync?

        to deal with l3 agent restarts:
        l3 agent will use full sync to replace the sync object on
        the server side.


    This is pretty much clear.


            big router concept

        on server side, we have a concept of a big router: include
        router,  its gateway port, its interfaces and related
        floating ips.

        one sync will sync all of these data by one shot from server
        to l3 agent.

        with multi-host and multi-l3 agents coming, we will be able
        to distribute the big routers among l3 agents. so don't
        worry about the data size in one sync.


    Indeed. But why worry about maintaining the last sync state for
    a lot of agents? I know your answer would be that it's just a
    data structure which maps an agent id to a timestamp, and it's a
    good argument. But we'll also have increased state because of
    the added fields, and increased computation logic as you'll need
    to scan all objects for veryfing whether more have been
    created/updated since last sync, and the number of those object
    can grow quite a lot.


          patches are:

        Add created_at and updated_at datetime columns.
        <https://review.openstack.org/#/c/15476/> I think adding
        created_at and updated_at are agreed by many core members,
        even if we don't agree the sync way.
        l3 agent rpc. (WORKINPROGRESS)
        <https://review.openstack.org/#/c/15619/> It is sync
        algorithm by now.

        Thanks
        Yong Sheng Gong



        --
        Mailing list: https://launchpad.net/~quantum-core
        <https://launchpad.net/%7Equantum-core>
        Post to     : [email protected]
        <mailto:[email protected]>
        Unsubscribe : https://launchpad.net/~quantum-core
        <https://launchpad.net/%7Equantum-core>
        More help   : https://help.launchpad.net/ListHelp





-- Mailing list: https://launchpad.net/~quantum-core
    <https://launchpad.net/%7Equantum-core>
    Post to     : [email protected]
    <mailto:[email protected]>
    Unsubscribe : https://launchpad.net/~quantum-core
    <https://launchpad.net/%7Equantum-core>
    More help   : https://help.launchpad.net/ListHelp


    --
    Mailing list: https://launchpad.net/~quantum-core
    <https://launchpad.net/%7Equantum-core>
    Post to     : [email protected]
    <mailto:[email protected]>
    Unsubscribe : https://launchpad.net/~quantum-core
    <https://launchpad.net/%7Equantum-core>
    More help   : https://help.launchpad.net/ListHelp




--
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dan Wendlandt
Nicira, Inc: www.nicira.com <http://www.nicira.com>
twitter: danwendlandt
~~~~~~~~~~~~~~~~~~~~~~~~~~~




-- 
Mailing list: https://launchpad.net/~quantum-core
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~quantum-core
More help   : https://help.launchpad.net/ListHelp

Reply via email to