Hi all,

I tested openflowplugin many times with frequent switch online and offline.
Many of the tests failed since openflowplugin was no longer able to handle
switch online event correctly.

Here are some reasons that caused such problem.
1. There's no synchronizing creating and closing ContextChain. When a
device is connected, ContextChainHolderImpl will create ContextChain for
that device. Then it will register ClusterSingletonService which would take
longer since there are locks in mdsal-singleton. If the device disconnects
due to idle or the clustering services are not able to start. The
ContextChain for that device will be destroyed. When examining
ClusterSingletonService registration in ContextChain.close(), it might get
a null because registering is still in progress. As a result, device's
ContextChain is destroyed but its ClusterSingletonService instance still
exists in mdsal-singleton, which can lead to another issue stated as
follows.

2. Before getting into the second issue. Let's recall
how openflowjava deals with switch IO. A channel is a connection between
the controller and a switch. A channel is served by a single thread among
EventLoopGroup. The thread is responsible for reading data from a channel
and writing data to the channel. When a switch gets online, a channel is
created to that switch. And an EventLoopGroup thread reads the data
transmitted by the switch. Once the OpenFlow handshake succeeds, a switch
initialization procedure in OFplugin is called by the EventLoopGroup
thread. When it goes to MDSAL for cluster service registration, there are
two different situations. In the first situation, if a Cluster Singleton
Service for the switch hasn’t been started, the EventLoopGroup thread will
be returned. Later an akka thread will send onOwnershipChanged notification
to Cluster Singleton Service and the initialization process for that switch
is invoked( ClusterSingletonServiceGroupImpl.startServices() ). The
initialization process requires the information of switch description(
DeviceContextImpl.instantiateServiceInstance ). So it sends a request to an
outbound queue and waits until it gets the result or timeouts. The
EventLoopGroup thread will consume the outbound queue and send the request
down to the switch. Once it receives the response it passes it up. Then the
akka thread gets the information it needs and finishes the initialization
process. In the second situation, if a Cluster Singleton Service for the
switch has been started, the EventLoopGroup thread will move on to starting
service. When starting the device service, the information of switch
description is required. The thread will wait until it gets the result.
However, EventLoopGroup thread is the one that sends requests to the switch
and receives its responses. But it is stuck because it is waiting for the
result. Thus, the device initialization process will fail.

As you might have noticed, the second situation is actually what happens if
a device has walked through the path stated in the first issue. Once
closing ContextChain without closing registration for Cluster Singleton
Service, the device will no longer able to be initialized again.

However, even if the first issue didn't happen, EventLoopGroup thread could
still start service on its own because this is a legal state in
ClusterSinlgetonServiceGroupImpl.registerService():
switch (localServicesState) {
   case STARTED:
       LOG.debug("Service group {} starting late-registered service {}",
identifier, service);
       service.instantiateServiceInstance();
       break;
    ...
}
There is chance that service state is already STARTED before the service is
added.

My questions are:
1. How to prevent closing ContextChain without closing registration for
Cluster Singleton Service?
2. How to prevent EventLoopGroup thread from waiting for switch description
while registering Cluster Singleton Service?

If anyone has ever looked into these issues, please share your opinions.

Thanks,
James
_______________________________________________
openflowplugin-dev mailing list
openflowplugin-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Reply via email to