Re: [openflowplugin-dev] Singleton Clustering issue

Anil Vishnoi Fri, 17 Feb 2017 17:49:34 -0800

I opened the bug to md-sal for the enhancement
https://bugs.opendaylight.org/show_bug.cgi?id=7820


On Fri, Feb 17, 2017 at 5:47 PM, Anil Vishnoi <[email protected]> wrote:

> Sorry i think i make minor mistake- it's not device disconnect, it's
> controller dies. In case controller dies, nobody will clean-up from the
> data store, so FRM won't deregister and in that case FRM in third
> controller can get the ownership.
>
> On Thu, Feb 16, 2017 at 12:32 AM, guo <[email protected]> wrote:
>
>> Hi Anil,
>>
>> Why is it happening in Issue 1?
>> *"and then you disconnect the device from second controller and reconnect
>> it, ownership goes to third controller"*
>>
>> I found that when disconnect the device from the second controller, the
>> device data in data store will be deleted. So the FRM will deregister the
>> service instance on the third controller, so the ownership goes to the
>> first controller.
>>
>> guo
>>
>> ------------------ 原始邮件 ------------------
>> *发件人:* "Anil Vishnoi";<[email protected]>;
>> *发送时间:* 2017年2月16日(星期四) 凌晨4:32
>> *收件人:* "Jozef Bacigál"<[email protected]>;
>> *抄送:* "[email protected]"<openflowplug
>> [email protected]>;
>> *主题:* Re: [openflowplugin-dev] Singleton Clustering issue
>>
>> Hi Jozef,
>>
>> I think this does not solve the issue, it actually will make sure that
>> node deleted first and then added after that, so that user can see the
>> node. But this delete and add, will create two data change notification for
>> the application and will give a impression that device was disconnected and
>> connected back, which is not really a case. I think the ideal solution as
>> you mentioned is if clustering service provide a notification saying the
>> device has no owner, so that it can clean-up. I think we should raise a bug
>> to the clustering team to provide this kind of API, so that we can use this
>> to give a proper solution.
>>
>> On Tue, Feb 14, 2017 at 12:54 AM, Jozef Bacigál <
>> [email protected]> wrote:
>>
>>> HI Anil, guys
>>>
>>>
>>>
>>> I am facing the same issue you are mentioned in Issue 2 with my single
>>> layer implementation. The plugin is not able to know if there is another
>>> controller connected to the switch so the only one and not good, even slow
>>> solution is/were (I am using right now) that if we lose mastership we are
>>> deleting node from DS and HOPE that is sooner than new master will write
>>> new node into DS. The best solution were to have the information if this
>>> was the last master in cluster for the switch. And then and only then
>>> delete the node from DS. What I am trying right know to hold status before
>>> the node is deleted from DS and then send the ImmediateFuture back to mdsal
>>> singleton, so the new master can be elected.
>>>
>>>
>>>
>>> Anyway it is very bad implementation FOR plugin from singleton service.
>>>
>>>
>>>
>>> Jozef
>>>
>>>
>>>
>>> *From:* Anil Vishnoi [mailto:[email protected]]
>>> *Sent:* Tuesday, February 14, 2017 4:37 AM
>>> *To:* Jozef Bacigál <[email protected]>; Abhijit Kumbhare <
>>> [email protected]>; Tomáš Slušný <[email protected]>;
>>> Shuva Jyoti Kar <[email protected]>; Luis Gomez <
>>> [email protected]>; Muthukumaran K <[email protected]>
>>> *Cc:* [email protected]
>>> *Subject:* Singleton Clustering issue
>>>
>>>
>>>
>>> Hi Jozef/Tomas/Luis,
>>>
>>>
>>>
>>> I was investigating Bug 7736
>>> <https://bugs.opendaylight.org/show_bug.cgi?id=7736> and came across
>>> few issue in our clustering implementation and also some limitation with
>>> singleton clustering as well.
>>>
>>>
>>>
>>> Issue 1 : Registering application on data change notification.
>>>
>>> In the current implementation, when plugin receives the connection from
>>> device, it register itself as a service instance to clustering singleton
>>> service. After registering with clustering service, it receives the
>>> notification to initialize the instance. It then try to set the master role
>>> to the device and then write the device data to the data store.
>>> Forwarding-Rule-Manager then listen on the data store notification and
>>> whenever it see that node is added to the data store, it registers itself
>>> as a service instance for that node. Given that we are using
>>> ClusteredDataTreeChangeListener, all the FRM instances get the node
>>> added notification from data store and all the cluster nodes end up
>>> registering themselves as a service instance on the same service
>>> identifier. So even if device is connected to only one controller FRM
>>> register itself on all the three nodes, that's not correct behavior. So
>>> this bug can cause a issue where openflowplugin cluster will be almost
>>> unusable. We have seen an issue where if you connect the device to two
>>> controllers and disconnect the device from first controller and connect it
>>> back, ownership goes to second controller where device is also connected,
>>> and then you disconnect the device from second controller and reconnect it,
>>> ownership goes to third controller, but given that now ownership for that
>>> service identity is with controller 3, even if device connect back to
>>> controller1/2, those controller don't push the master role down. And this
>>> scenario can occur trigger the moment your device disconnect from any of
>>> the controller.
>>>
>>>
>>>
>>> Now problem is that for applications there is no way to find out if the
>>> device is connected to it's host controller instance (until and unless we
>>> write some hardcoded controller number/name in the data store for each
>>> device where it's connected). The only way i can see is through the yang
>>> notification, where plugin can send the nodeAdded/nodeRemoved notification
>>> and application can register themself as a service instance if they receive
>>> those events. That way we can avoid the problem i mentioned above. I pushed
>>> a patch that does the same thing and it resolves this issue.
>>>
>>>
>>>
>>> https://git.opendaylight.org/gerrit/#/c/51489/
>>>
>>>
>>>
>>> Issue 2: Data Change notification every time node disconnect from any of
>>> the node in cluster
>>>
>>>
>>>
>>> Current implementation we see that even if the device is connected to
>>> all the three controller, and the moment device disconnect from one of the
>>> controller, applications receive data change notification where node data
>>> is removed and shortly after another notification with the node data added.
>>> Application thinks that the device just got disconnect from the controllers
>>> and reconnected back, but in reality device is still connected to the
>>> remaining two controller. I think the reason behind this is that the
>>> current implementation of the singleton service don't send any notification
>>> to non-owner controllers about the ownership of the device (e.g
>>> isOwner=false, hasOwner=false, wasOwner=false). I think because of this
>>> limitation we wrote the code in a way that whenever closeServiceInstance()
>>> is called plugin removes the data from data store and when the other
>>> controller get instantiateServiceInstance() it put the data back to data
>>> store. And that actually generates two events for the application. Given
>>> that device is connected to all the controllers, this behavior is not
>>> correct. I can't think of any solution that can fix that, until and unless
>>> singleton clustering service provide a specific notification about it to
>>> other controllers, so that those controllers can device if they want to
>>> clean-up the data or ignore it given that one of them is still an owner of
>>> the device.
>>>
>>>
>>>
>>> This same functional behavior can create another issue. If the device is
>>> connected to only one controller in the cluster  and user kill that
>>> controller, it would leave the stale data in the data store, because other
>>> controllers won't be notified given that they didn't register as a service
>>> instance for the service-group-id. I think this is major limitation and not
>>> sure plugin can resolve it by itself (until and unless we use EOS +
>>> Singleton Clustering Service hack to make it work).
>>>
>>>
>>>
>>> Let me know your thoughts.
>>>
>>>
>>>
>>> Side question: do anybody know if any enhancement is proposed in md-sal
>>> project that can help solving this issue?
>>>
>>>
>>>
>>> --
>>>
>>> Thanks
>>>
>>> Anil
>>>
>>>
>>>
>>> JozefBacigál
>>>
>>> Senior Software Engineer
>>>
>>>
>>> Sídlo / Mlynské Nivy 56 / 821 05 Bratislava / Slovakia
>>> R&D centrum / Janka Kráľa 9 /  974 01 Banská Bystrica / Slovakia
>>> +421 908 766 972 / [email protected]
>>> reception: +421 2 206 65 114 / www.pantheon.tech
>>>
>>>
>>>
>>
>>
>>
>> --
>> Thanks
>> Anil
>>
>
>
>
> --
> Thanks
> Anil
>



-- 
Thanks
Anil

_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Re: [openflowplugin-dev] Singleton Clustering issue

Reply via email to