Hi Salvatore,

Thanks for your reply...

On 08/05/15 09:20, Salvatore Orlando wrote:
Just like the Neutron plugin manager, also ML2 driver manager ensure
drivers are loaded only once regardless of the number of workers.
What Kevin did proves that drivers are correctly loaded before forking
(I reckon).

Yes, up to a point. It seems clear that we can rely on the following events being ordered:

1. Mechanism drivers are instantiated (__init__) and initialized (initialize).

2. The Neutron server forks (into a number of copies as dictated by api_workers and rpc_workers).

3. Mechanism driver entry points such as create_port_pre/postcommit are called.

However...

However, forking is something to be careful about especially when using
eventlet. For the plugin my team maintains we were creating a periodic
task during plugin initialisation.
This lead to an interesting condition where API workers were hanging
[1]. This situation was fixed with a rather pedestrian fix - by adding a
delay.

Yes! This is precisely the same situation that I have. Currently I am also planning to 'fix' it by adding a delay of a few seconds. However that is not an amazing fix, because if there is something that a mechanism driver needs to do on startup, it would probably rather do it as soon as possible; and on the other hand because it involves guessing how long steps (1) and (2) above will take.

Readers may be wondering why a mechanism driver needs to do something on startup? In general, the answer is so as to recheck the Neutron DB - i.e. any VMs/ports that should already exist - and ensure that the driver's downstream components are all correctly in sync with that. In Calico's case, that means auditing that the routing and iptables on each compute host match to the current VM and security configuration.

This need is implied by the existence of the _postcommit entry points. When a mechanism driver is implemented using those entry points, it is possible for driver or downstream software to crash after the Neutron DB believes that a transaction has been committed, and leave dataplane state wrong. Clearly, then, when the driver or downstream software is restarted, it needs to resync against the standing Neutron DB.

Generally speaking I would find useful to have a way to "identify" an
API worker in order to designate a specific one for processing that
should not be made redundant.
On the other hand I self-object to the above statement by saying that
API workers are not supposed to do this kind of processing, which should
be deferred to some other helper process.

+1 on both points :-)

There could be a post_fork() mechanism driver entry point. It wouldn't matter which worker or helper process called it; the requirement would be simply that it would only be called once, after all the forking has occurred.

Regards,
        Neil


Salvatore

[1] https://bugs.launchpad.net/vmware-nsx/+bug/1420278

On 8 May 2015 at 09:43, Kevin Benton <blak...@gmail.com
<mailto:blak...@gmail.com>> wrote:

    I'm not sure I understand the behavior you are seeing. When your
    mechanism driver gets initialized and kicks off processing, all of
    that should be happening in the parent PID. I don't know why your
    child processes start executing code that wasn't invoked. Can you
    provide a pointer to the code or give a sample that reproduces the
    issue?

    I modified the linuxbridge mech driver to try to reproduce it:
    http://paste.openstack.org/show/216859/

    In the output, I never received any of the init code output I added
    more than once, including the function spawned using eventlet.

    The only time I ever saw anything executed by a child process was
    actual API requests (e.g. the create_port method).


    On Thu, May 7, 2015 at 6:08 AM, Neil Jerram
    <neil.jer...@metaswitch.com <mailto:neil.jer...@metaswitch.com>> wrote:

        Is there a design for how ML2 mechanism drivers are supposed to
        cope with the Neutron server forking?

        What I'm currently seeing, with api_workers = 2, is:

        - my mechanism driver gets instantiated and initialized, and
        immediately kicks off some processing that involves
        communicating over the network

        - the Neutron server process then forks into multiple copies

        - multiple copies of my driver's network processing then
        continue, and interfere badly with each other :-)

        I think what I should do is:

        - wait until any forking has happened

        - then decide (somehow) which mechanism driver is going to kick
        off that processing, and do that.

        But how can a mechanism driver know when the Neutron server
        forking has happened?

        Thanks,
                 Neil

        
__________________________________________________________________________
        OpenStack Development Mailing List (not for usage questions)
        Unsubscribe:
        openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
        <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




    --
    Kevin Benton

    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
    <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to