Re: [openflowplugin-dev] Singleton Clustering issue

Luis Gomez Palacios Tue, 14 Feb 2017 10:02:36 -0800

I think the problem is not the number of controller instances but the
number of switch connections. This last is also something you cannot
control as switches can connect to any number of instances at any given
time.


It is kind of unfortunate we found these issues at this moment and not
earlier in the release but i believe and hope we can still do something in
controller and ofp projects to fix this. Specially for the first we should
quickly identify any fix or feature we require.



On Feb 14, 2017 1:20 AM, "Jozef Bacigál" <[email protected]>
wrote:

> Anyway, my question is, shall not “cluster” be defined as at least three
> nodes ? Or should not be there odd number of controllers ? With two or less
> controller it is no more cluster I think.
>
>
>
> Jozef
>
>
>
> *From:* Jozef Bacigál [mailto:[email protected]]
> *Sent:* Tuesday, February 14, 2017 9:55 AM
> *To:* Anil Vishnoi <[email protected]>; Abhijit Kumbhare <
> [email protected]>; Tomáš Slušný <[email protected]>; Shuva
> Jyoti Kar <[email protected]>; Luis Gomez <[email protected]>;
> Muthukumaran K <[email protected]>
> *Cc:* [email protected]
> *Subject:* Re: [openflowplugin-dev] Singleton Clustering issue
>
>
>
> HI Anil, guys
>
>
>
> I am facing the same issue you are mentioned in Issue 2 with my single
> layer implementation. The plugin is not able to know if there is another
> controller connected to the switch so the only one and not good, even slow
> solution is/were (I am using right now) that if we lose mastership we are
> deleting node from DS and HOPE that is sooner than new master will write
> new node into DS. The best solution were to have the information if this
> was the last master in cluster for the switch. And then and only then
> delete the node from DS. What I am trying right know to hold status before
> the node is deleted from DS and then send the ImmediateFuture back to mdsal
> singleton, so the new master can be elected.
>
>
>
> Anyway it is very bad implementation FOR plugin from singleton service.
>
>
>
> Jozef
>
>
>
> *From:* Anil Vishnoi [mailto:[email protected] <[email protected]>]
>
> *Sent:* Tuesday, February 14, 2017 4:37 AM
> *To:* Jozef Bacigál <[email protected]>; Abhijit Kumbhare <
> [email protected]>; Tomáš Slušný <[email protected]>; Shuva
> Jyoti Kar <[email protected]>; Luis Gomez <[email protected]>;
> Muthukumaran K <[email protected]>
> *Cc:* [email protected]
> *Subject:* Singleton Clustering issue
>
>
>
> Hi Jozef/Tomas/Luis,
>
>
>
> I was investigating Bug 7736
> <https://bugs.opendaylight.org/show_bug.cgi?id=7736> and came across few
> issue in our clustering implementation and also some limitation with
> singleton clustering as well.
>
>
>
> Issue 1 : Registering application on data change notification.
>
> In the current implementation, when plugin receives the connection from
> device, it register itself as a service instance to clustering singleton
> service. After registering with clustering service, it receives the
> notification to initialize the instance. It then try to set the master role
> to the device and then write the device data to the data store.
> Forwarding-Rule-Manager then listen on the data store notification and
> whenever it see that node is added to the data store, it registers itself
> as a service instance for that node. Given that we are using
> ClusteredDataTreeChangeListener, all the FRM instances get the node added
> notification from data store and all the cluster nodes end up registering
> themselves as a service instance on the same service identifier. So even if
> device is connected to only one controller FRM register itself on all the
> three nodes, that's not correct behavior. So this bug can cause a issue
> where openflowplugin cluster will be almost unusable. We have seen an issue
> where if you connect the device to two controllers and disconnect the
> device from first controller and connect it back, ownership goes to second
> controller where device is also connected, and then you disconnect the
> device from second controller and reconnect it, ownership goes to third
> controller, but given that now ownership for that service identity is with
> controller 3, even if device connect back to controller1/2, those
> controller don't push the master role down. And this scenario can occur
> trigger the moment your device disconnect from any of the controller.
>
>
>
> Now problem is that for applications there is no way to find out if the
> device is connected to it's host controller instance (until and unless we
> write some hardcoded controller number/name in the data store for each
> device where it's connected). The only way i can see is through the yang
> notification, where plugin can send the nodeAdded/nodeRemoved notification
> and application can register themself as a service instance if they receive
> those events. That way we can avoid the problem i mentioned above. I pushed
> a patch that does the same thing and it resolves this issue.
>
>
>
> https://git.opendaylight.org/gerrit/#/c/51489/
>
>
>
> Issue 2: Data Change notification every time node disconnect from any of
> the node in cluster
>
>
>
> Current implementation we see that even if the device is connected to all
> the three controller, and the moment device disconnect from one of the
> controller, applications receive data change notification where node data
> is removed and shortly after another notification with the node data added.
> Application thinks that the device just got disconnect from the controllers
> and reconnected back, but in reality device is still connected to the
> remaining two controller. I think the reason behind this is that the
> current implementation of the singleton service don't send any notification
> to non-owner controllers about the ownership of the device (e.g
> isOwner=false, hasOwner=false, wasOwner=false). I think because of this
> limitation we wrote the code in a way that whenever closeServiceInstance()
> is called plugin removes the data from data store and when the other
> controller get instantiateServiceInstance() it put the data back to data
> store. And that actually generates two events for the application. Given
> that device is connected to all the controllers, this behavior is not
> correct. I can't think of any solution that can fix that, until and unless
> singleton clustering service provide a specific notification about it to
> other controllers, so that those controllers can device if they want to
> clean-up the data or ignore it given that one of them is still an owner of
> the device.
>
>
>
> This same functional behavior can create another issue. If the device is
> connected to only one controller in the cluster  and user kill that
> controller, it would leave the stale data in the data store, because other
> controllers won't be notified given that they didn't register as a service
> instance for the service-group-id. I think this is major limitation and not
> sure plugin can resolve it by itself (until and unless we use EOS +
> Singleton Clustering Service hack to make it work).
>
>
>
> Let me know your thoughts.
>
>
>
> Side question: do anybody know if any enhancement is proposed in md-sal
> project that can help solving this issue?
>
>
>
> --
>
> Thanks
>
> Anil
>
>
>
> Jozef*Bacigál*
>
> Senior Software Engineer
>
>
> Sídlo / Mlynské Nivy 56 / 821 05 Bratislava / Slovakia
> R&D centrum / Janka Kráľa 9 /  974 01 Banská Bystrica / Slovakia
> +421 908 766 972 <+421%20908%20766%20972> / [email protected]
> reception: +421 2 206 65 114 / www.pantheon.tech
>
> [image: logo]
>
>
>

_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Re: [openflowplugin-dev] Singleton Clustering issue

Reply via email to