Hi all, I tested openflowplugin many times with frequent switch online and offline. Many of the tests failed since openflowplugin was no longer able to handle switch online event correctly.
Here are some reasons that caused such problem. 1. There's no synchronizing creating and closing ContextChain. When a device is connected, ContextChainHolderImpl will create ContextChain for that device. Then it will register ClusterSingletonService which would take longer since there are locks in mdsal-singleton. If the device disconnects due to idle or the clustering services are not able to start. The ContextChain for that device will be destroyed. When examining ClusterSingletonService registration in ContextChain.close(), it might get a null because registering is still in progress. As a result, device's ContextChain is destroyed but its ClusterSingletonService instance still exists in mdsal-singleton, which can lead to another issue stated as follows. 2. Before getting into the second issue. Let's recall how openflowjava deals with switch IO. A channel is a connection between the controller and a switch. A channel is served by a single thread among EventLoopGroup. The thread is responsible for reading data from a channel and writing data to the channel. When a switch gets online, a channel is created to that switch. And an EventLoopGroup thread reads the data transmitted by the switch. Once the OpenFlow handshake succeeds, a switch initialization procedure in OFplugin is called by the EventLoopGroup thread. When it goes to MDSAL for cluster service registration, there are two different situations. In the first situation, if a Cluster Singleton Service for the switch hasn’t been started, the EventLoopGroup thread will be returned. Later an akka thread will send onOwnershipChanged notification to Cluster Singleton Service and the initialization process for that switch is invoked( ClusterSingletonServiceGroupImpl.startServices() ). The initialization process requires the information of switch description( DeviceContextImpl.instantiateServiceInstance ). So it sends a request to an outbound queue and waits until it gets the result or timeouts. The EventLoopGroup thread will consume the outbound queue and send the request down to the switch. Once it receives the response it passes it up. Then the akka thread gets the information it needs and finishes the initialization process. In the second situation, if a Cluster Singleton Service for the switch has been started, the EventLoopGroup thread will move on to starting service. When starting the device service, the information of switch description is required. The thread will wait until it gets the result. However, EventLoopGroup thread is the one that sends requests to the switch and receives its responses. But it is stuck because it is waiting for the result. Thus, the device initialization process will fail. As you might have noticed, the second situation is actually what happens if a device has walked through the path stated in the first issue. Once closing ContextChain without closing registration for Cluster Singleton Service, the device will no longer able to be initialized again. However, even if the first issue didn't happen, EventLoopGroup thread could still start service on its own because this is a legal state in ClusterSinlgetonServiceGroupImpl.registerService(): switch (localServicesState) { case STARTED: LOG.debug("Service group {} starting late-registered service {}", identifier, service); service.instantiateServiceInstance(); break; ... } There is chance that service state is already STARTED before the service is added. My questions are: 1. How to prevent closing ContextChain without closing registration for Cluster Singleton Service? 2. How to prevent EventLoopGroup thread from waiting for switch description while registering Cluster Singleton Service? If anyone has ever looked into these issues, please share your opinions. Thanks, James
_______________________________________________ openflowplugin-dev mailing list openflowplugin-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev