Hello openflowplugin-dev

I founded a BUG, opened here https://bugs.opendaylight.org/show_bug.cgi?id=6625 
<https://bugs.opendaylight.org/show_bug.cgi?id=6625> and I raised this BUG as 
critical, as it appears to me that it is really alarming for any production 
large scaled environment, as flapping network unfortunately happen a lot.

When RPC are called to add/remove/update a flow on the switch, and the switch 
in going down while the request is flying, the RPC never returns a result, as 
pending on the underlying async request to return. As it never returns because 
the device is no longer present, the thread is leaked along with the FD for the 
RESTCONF request that triggered the RPC. 

Current fix [1] is to setup a failed future once a timeout of 2000 milliseconds 
is reached. This way the RPC returns and resources are freed. About that 
timeout, I’ve seen that the RequestContext can be set with a timeout but this 
wasn’t doing anything.

I think this issue is hiding a more deeper problem regarding resource 
management and the global tracking/livecycle of requests flying for a given 
switch. As when the switch goes down, all on-going requests should be closed.

Please provide feedback on this.

[1]: https://git.opendaylight.org/gerrit/#/c/45112/ 
<https://git.opendaylight.org/gerrit/#/c/45112/>


_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Reply via email to