Re: [L2switch-dev] [openflowplugin-dev] Blocker bug analysis

2016-10-19 Thread Luis Gomez
Hi Sai, after looking at the test more in detail this is not exactly the 
behavior (sorry for the confusion), what is really happening is:

- Mininet restarts
- There is no host in address-tracker after restart
- We do the classic mininet pingall test
- After the pingall test the host addresses (10.0.0.1, 10.0.0.2, 10.0.0.3) are 
seen in all switches, while in normal scenario 10.0.0.1 should be only in s1 
to-host port, 10.0.0.2 should be only in s2 to-host port and 10.0.0.3 should be 
only in s3 to-host port.

The normal scenario happens when we leave 2 secs between mininet stop + start.

So 2 questions:

- What is the impact of address tracker registering remote IP addresses in the 
switch-to-switch ports? As far as I can see flows are generated correctly even 
in this case.
- Any idea why the address tracker would get confused and add IP addresses to 
the switch-to-switch ports? maybe the application fails to identify these ports 
as going switch-to-switch?

BR/Luis


> On Oct 19, 2016, at 7:42 AM, Luis Gomez  wrote:
> 
> Timer=600 sec but in Boron when we leave just 2 secs between mininet stop & 
> start, we do not see any IP address, also in Beryllium we do no see any IP 
> address after restarting mininet, so are you sure switch disconnect/connect 
> does not impact address tracker cache?
> 
> BR/Luis
> 
>> On Oct 19, 2016, at 3:37 AM, Sai MarapaReddy > > wrote:
>> 
>> Hi Luis,
>> 
>> Addresses will be removed if this condition [1] passes.
>> 
>> Default value of time 'timeStampUpdateInterval' is mentioned here [2].
>> 
>> Please try reducing the default value of time to make sure to addresses are 
>> flushed. 
>> 
>> Since every time the mininet connects with same IP / Mac values, addresses 
>> will only be removed if condition at [1] is met.  
>> 
>> 
>> 
>> [1] 
>> https://github.com/opendaylight/l2switch/blob/master/addresstracker/implementation/src/main/java/org/opendaylight/l2switch/addresstracker/addressobserver/AddressObservationWriter.java#L144
>>  
>> 
>> 
>> [2] 
>> https://github.com/opendaylight/l2switch/blob/master/addresstracker/implementation/src/main/yang/address-tracker-config.yang#L17
>>  
>> 
>> 
>> 
>> 
>> On Tue, Oct 18, 2016 at 7:30 PM, Luis Gomez > > wrote:
>> Thanks Sai, yes I have a question regrading the first issue below: any idea 
>> why address tracker would keep discovered IP address information after 
>> restarting mininet? like what is today in the code to flush the address 
>> cache and why this is not working when mininet restarts?
>> 
>> BR/Luis
>> 
>>> On Oct 18, 2016, at 6:56 PM, Sai MarapaReddy >> > wrote:
>>> 
>>> Thanks Luis. 
>>> 
>>> Yes that is drop-all flow. Please let us know if you need any further help 
>>> from l2switch. 
>>> 
>>> On Tue, Oct 18, 2016 at 1:03 PM, Luis Gomez >> > wrote:
>>> Actually this is the drop flow so it is expected, anyway going deeper into 
>>> l2switch test issues I see:
>>> 
>>> - The address tracker issue is because hosts addresses are not getting 
>>> cleared when we restart mininet with no sleep in-between.
>>> - The loop remover issue could be related to some test issue, I am trying 
>>> to clean up/repair the test here: 
>>> https://git.opendaylight.org/gerrit/#/c/47095/ 
>>> 
>>> 
>>> BR/Luis
>>> 
>>> > On Oct 18, 2016, at 10:02 AM, Luis Gomez >> > > wrote:
>>> >
>>> > There are definitely weird things going on in l2switch, the first ERROR 
>>> > you mention produces some weird flow (no match and no action) in the 
>>> > switch:
>>> >
>>> > Flow 55:
>>> >"cookie": 3098476543630901303,
>>> >"flags": "",
>>> >"hard-timeout": 0,
>>> >"id": "55",
>>> >"idle-timeout": 0,
>>> >"match": {},
>>> >"opendaylight-flow-statistics:flow-statistics": {
>>> >"byte-count": 0,
>>> >"duration": {
>>> >"nanosecond": 23900,
>>> >"second": 36
>>> >},
>>> >"packet-count": 0
>>> >},
>>> >"priority": 0,
>>> >"table_id": 0
>>> >
>>> >
>>> > The second issue, I am not sure what is the impact but definitely 
>>> > something to look at.
>>> >
>>> > I think in general we will need l2switch support to troubleshoot and 

Re: [L2switch-dev] [openflowplugin-dev] Blocker bug analysis

2016-10-18 Thread Luis Gomez
Actually this is the drop flow so it is expected, anyway going deeper into 
l2switch test issues I see:

- The address tracker issue is because hosts addresses are not getting cleared 
when we restart mininet with no sleep in-between.
- The loop remover issue could be related to some test issue, I am trying to 
clean up/repair the test here: https://git.opendaylight.org/gerrit/#/c/47095/

BR/Luis 

> On Oct 18, 2016, at 10:02 AM, Luis Gomez  wrote:
> 
> There are definitely weird things going on in l2switch, the first ERROR you 
> mention produces some weird flow (no match and no action) in the switch:
> 
> Flow 55:
>"cookie": 3098476543630901303,
>"flags": "",
>"hard-timeout": 0,
>"id": "55",
>"idle-timeout": 0,
>"match": {},
>"opendaylight-flow-statistics:flow-statistics": {
>"byte-count": 0,
>"duration": {
>"nanosecond": 23900,
>"second": 36
>},
>"packet-count": 0
>},
>"priority": 0,
>"table_id": 0
> 
> 
> The second issue, I am not sure what is the impact but definitely something 
> to look at.
> 
> I think in general we will need l2switch support to troubleshoot and progress 
> in current issues.
> 
> BR/Luis
> 
> 
>> On Oct 17, 2016, at 8:09 AM, Miroslav Macko  
>> wrote:
>> 
>> Hello Luis and dev guys,
>> 
>> There is this info from l2-switch in the karaf log:
>> 
>> 2016-10-14 22:43:24,941 | INFO  | pool-16-thread-1 | FlowWriterServiceImpl   
>>  | 229 - org.opendaylight.l2switch.main.impl - 0.5.0.SNAPSHOT | In 
>> addMacToMacFlowsUsingShortestPath: No flows added. Source and Destination 
>> ports are same.
>> 
>> Is it ok?
>> 
>> And this debug message(it has incorrect severity):
>> 
>> 2016-10-14 22:43:55,282 | ERROR | entLoopGroup-5-3 | DeviceFlowRegistryImpl  
>>  | 210 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | 
>> Flow with flowId 85 already exists in table 0
>> 
>> So it looks like that flows with the same ID are added.
>> 
>> What in short these failing tests are doing and what it should test please?
>> 
>> Thanks,
>> Miro
>> 
>> 
>> Od: Luis Gomez 
>> Odoslané: 15. októbra 2016 1:01
>> Komu: Tomáš Slušný
>> Kópia: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
>> Predmet: Re: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis
>> 
>> No luck :), still fails on "address tracker" and "loop remover":
>> 
>> https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-carbon/154/
>> https://logs.opendaylight.org/releng/jenkins092/l2switch-csit-1node-switch-only-carbon/154/archives/log.html.gz
>> 
>> BR/Luis
>> 
>> 
>>> On Oct 14, 2016, at 2:26 PM, Luis Gomez  wrote:
>>> 
>>> OK Tomas, I will try your patch 
>>> https://git.opendaylight.org/gerrit/#/c/46390/ on 
>>> l2switch-csit-1node-switch-only-carbon and let you know.
>>> 
>>> 
 On Oct 14, 2016, at 1:23 PM, Tomáš Slušný  
 wrote:
 
 Also, 6710 was already merged only in master branch, but not yet in boron. 
 It have cherry-pick ready here: 
 https://git.opendaylight.org/gerrit/#/c/46761/, but until it is merged, I 
 think it would be better to check for improvement in Jenkins test for 
 Carbon here: 
 https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-carbon/
 Od: Tomáš Slušný 
 Odoslané: 14. októbra 2016 22:13
 Komu: Luis Gomez; Abhijit Kumbhare
 Kópia: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
 Predmet: Re: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis
 
 ​6710 was related to both cluster and single node, because even in single 
 node scenario, we are registering to ClusterSingletonService, what had 
 problems when we tried to re-register, when we do not fully closed 
 previous registration (so, when we disconnected and reconnected too fast). 
 Most part of this fix was done in bug 6710, but part of it had to be done 
 also in openflowplugin side in this gerrit patch: 
 https://git.opendaylight.org/gerrit/#/c/46390/​. So, Luis, is it possible 
 to run failing l2switch test on that patch and see, if it solves anything, 
 or not?
 
 Od: Luis Gomez 
 Odoslané: 14. októbra 2016 21:49
 Komu: Abhijit Kumbhare
 Kópia: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
 Predmet: Re: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis
 
 Well I think if openflowplugin people cannot figure out where the issue 
 is, the next would be to get some debug and further analysis from l2switch 

Re: [L2switch-dev] [openflowplugin-dev] Blocker bug analysis

2016-10-17 Thread Shuva Jyoti Kar
Latest on this.

The bug description changed a bit over the weekend.
the issue was seen  with big flows (lot of matches) just adding (no deleting) 
different big flows every 8 secs. I tried it around 13 times today back-to-back 
with 8 secs gap in between , but was not able to hit the issue.
I tried by changing the ipv6-destination-address for each of the flows.

Thanks
Shuva

From: Shuva Jyoti Kar
Sent: Saturday, October 15, 2016 12:08 PM
To: 'Abhijit Kumbhare'; Luis Gomez
Cc: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
Subject: RE: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis

About 6917, I have updated the bug with my observations.

Tried it about 23 times(add/get/del) in successions of 5 secs(used a stopwatch) 
but could see the alien flow id in oper DS only once, and that too the moment I 
tried a re-get it vanished.

I did use a different flow object, but I donot think that should cause any 
problem.

In any case I will retry with the one mentioned in the original problem 
description and then revert back

Thanks
Shuva

From: 
openflowplugin-dev-boun...@lists.opendaylight.org
 [mailto:openflowplugin-dev-boun...@lists.opendaylight.org] On Behalf Of 
Abhijit Kumbhare
Sent: Friday, October 14, 2016 10:24 PM
To: Luis Gomez
Cc: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
Subject: Re: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis

About the first one - https://bugs.opendaylight.org/show_bug.cgi?id=6575:

Tomas has said in yesterday's meeting that this was dependent on 
https://bugs.opendaylight.org/show_bug.cgi?id=6710. And that it was fixed. Can 
we retest? Also a question for Tomas: 6710 is cluster related - while 6575 is 
not clear if it is cluster or single node. Are we sure 6575 is fixed?

The second one: https://bugs.opendaylight.org/show_bug.cgi?id=6917 - Shuva will 
respond about that some time.



On Thu, Oct 13, 2016 at 9:50 PM, Luis Gomez 
> wrote:
Hi all,

We are getting close to Boron SR1 so I think it makes sense to review the 2 
blocking issues we have:


1) https://bugs.opendaylight.org/show_bug.cgi?id=6575

Summary:

l2switch does not work well when mininet is disconnected and connected with no 
time in-between.

Description:

This is kind of old issue, since the He->Li migration the l2switch has 
experienced random issues in the system test:

https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-boron/

Same test passes fine in Beryllium as you can see below:

https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-beryllium/

The last discovery (just before Boron release) was that giving more time 
between stop mininet and start mininet made the suite pass.

Criticality:

Although this was a clear regression in l2switch test (Be->B), this bug was not 
initially marked as blocker because it was not trivial to reproduce (e.g. 
switch connection flap).

Risk of not fixing:

l2switch and other similar applications relying on ofplugin may not work well 
when switch connection flaps.


2) https://bugs.opendaylight.org/show_bug.cgi?id=6917

Summary:

Flow matching function (operational flow reconciliation) is not stable.

Description:

I discovered this issue doing some random flow push test in my laptop using 
POSTMAN: adding and deleting the same flow few times produced an alien ID in 
the operational flow.
After that I have created a test that does exactly that: add flow, verify 
operational ID, delete flow, sleep 5s, repeat. With these simple steps the 
issue shows consistently for Boron (new test):

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-1node-flow-services-only-boron/758/archives/log.html.gz

But not in Beryllium:

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-1node-flow-services-only-beryllium/1854/archives/log.html.gz

Criticality:

Besides the test regression, I think there are applications in ODL relying on 
operational flow ID that would be negatively impacted by this bug.

Risk of not fixing:

OF applications relying on operational flow ID (e.g. to confirm flows) can 
sporadically fail.


___
L2switch-dev mailing list
L2switch-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/l2switch-dev

___
L2switch-dev mailing list
L2switch-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/l2switch-dev


Re: [L2switch-dev] [openflowplugin-dev] Blocker bug analysis

2016-10-17 Thread Miroslav Macko
Hello Luis and dev guys,

There is this info from l2-switch in the karaf log:

2016-10-14 22:43:24,941 | INFO  | pool-16-thread-1 | FlowWriterServiceImpl  
  | 229 - org.opendaylight.l2switch.main.impl - 0.5.0.SNAPSHOT | In 
addMacToMacFlowsUsingShortestPath: No flows added. Source and Destination ports 
are same.

Is it ok?

And this debug message(it has incorrect severity):

2016-10-14 22:43:55,282 | ERROR | entLoopGroup-5-3 | DeviceFlowRegistryImpl 
  | 210 - org.opendaylight.openflowplugin.impl - 0.4.0.SNAPSHOT | Flow with 
flowId 85 already exists in table 0

So it looks like that flows with the same ID are added.

What in short these failing tests are doing and what it should test please?

Thanks,
Miro


Od: Luis Gomez 
Odoslané: 15. októbra 2016 1:01
Komu: Tomáš Slušný
Kópia: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
Predmet: Re: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis

No luck :), still fails on "address tracker" and "loop remover":

https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-carbon/154/
https://logs.opendaylight.org/releng/jenkins092/l2switch-csit-1node-switch-only-carbon/154/archives/log.html.gz

BR/Luis


> On Oct 14, 2016, at 2:26 PM, Luis Gomez  wrote:
>
> OK Tomas, I will try your patch 
> https://git.opendaylight.org/gerrit/#/c/46390/ on 
> l2switch-csit-1node-switch-only-carbon and let you know.
>
>
>> On Oct 14, 2016, at 1:23 PM, Tomáš Slušný  wrote:
>>
>> Also, 6710 was already merged only in master branch, but not yet in boron. 
>> It have cherry-pick ready here: 
>> https://git.opendaylight.org/gerrit/#/c/46761/, but until it is merged, I 
>> think it would be better to check for improvement in Jenkins test for Carbon 
>> here: 
>> https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-carbon/
>> Od: Tomáš Slušný 
>> Odoslané: 14. októbra 2016 22:13
>> Komu: Luis Gomez; Abhijit Kumbhare
>> Kópia: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
>> Predmet: Re: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis
>>
>> ​6710 was related to both cluster and single node, because even in single 
>> node scenario, we are registering to ClusterSingletonService, what had 
>> problems when we tried to re-register, when we do not fully closed previous 
>> registration (so, when we disconnected and reconnected too fast). Most part 
>> of this fix was done in bug 6710, but part of it had to be done also in 
>> openflowplugin side in this gerrit patch: 
>> https://git.opendaylight.org/gerrit/#/c/46390/​. So, Luis, is it possible to 
>> run failing l2switch test on that patch and see, if it solves anything, or 
>> not?
>>
>> Od: Luis Gomez 
>> Odoslané: 14. októbra 2016 21:49
>> Komu: Abhijit Kumbhare
>> Kópia: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
>> Predmet: Re: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis
>>
>> Well I think if openflowplugin people cannot figure out where the issue is, 
>> the next would be to get some debug and further analysis from l2switch 
>> people explaining the application miss-behavior.
>>
>> If there are no people available in l2switch to do this, we can lower the 
>> priority to major or critical.
>>
>> BR/Luis
>>
>>
>>> On Oct 14, 2016, at 12:35 PM, Luis Gomez  wrote:
>>>
>>> As far as I can tell nothing we have done so far has helped l2switch test, 
>>> it is still failing.
>>>
>>> BR/Luis
>>>
 On Oct 14, 2016, at 9:53 AM, Abhijit Kumbhare  
 wrote:

 About the first one - https://bugs.opendaylight.org/show_bug.cgi?id=6575:

 Tomas has said in yesterday's meeting that this was dependent on 
 https://bugs.opendaylight.org/show_bug.cgi?id=6710. And that it was fixed. 
 Can we retest? Also a question for Tomas: 6710 is cluster related - while 
 6575 is not clear if it is cluster or single node. Are we sure 6575 is 
 fixed?

 The second one: https://bugs.opendaylight.org/show_bug.cgi?id=6917 - Shuva 
 will respond about that some time.



 On Thu, Oct 13, 2016 at 9:50 PM, Luis Gomez  wrote:
 Hi all,

 We are getting close to Boron SR1 so I think it makes sense to review the 
 2 blocking issues we have:


 1) https://bugs.opendaylight.org/show_bug.cgi?id=6575

 Summary:

 l2switch does not work well when mininet is disconnected and connected 
 with no time in-between.

 Description:

 This is kind of old issue, since the He->Li migration the l2switch has 
 experienced random issues in the system test:

 https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-boron/

 Same test passes fine in Beryllium as you can see below:

Re: [L2switch-dev] [openflowplugin-dev] Blocker bug analysis

2016-10-15 Thread Shuva Jyoti Kar
About 6917, I have updated the bug with my observations.

Tried it about 23 times(add/get/del) in successions of 5 secs(used a stopwatch) 
but could see the alien flow id in oper DS only once, and that too the moment I 
tried a re-get it vanished.

I did use a different flow object, but I donot think that should cause any 
problem.

In any case I will retry with the one mentioned in the original problem 
description and then revert back

Thanks
Shuva

From: openflowplugin-dev-boun...@lists.opendaylight.org 
[mailto:openflowplugin-dev-boun...@lists.opendaylight.org] On Behalf Of Abhijit 
Kumbhare
Sent: Friday, October 14, 2016 10:24 PM
To: Luis Gomez
Cc: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
Subject: Re: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis

About the first one - https://bugs.opendaylight.org/show_bug.cgi?id=6575:

Tomas has said in yesterday's meeting that this was dependent on 
https://bugs.opendaylight.org/show_bug.cgi?id=6710. And that it was fixed. Can 
we retest? Also a question for Tomas: 6710 is cluster related - while 6575 is 
not clear if it is cluster or single node. Are we sure 6575 is fixed?

The second one: https://bugs.opendaylight.org/show_bug.cgi?id=6917 - Shuva will 
respond about that some time.



On Thu, Oct 13, 2016 at 9:50 PM, Luis Gomez 
> wrote:
Hi all,

We are getting close to Boron SR1 so I think it makes sense to review the 2 
blocking issues we have:


1) https://bugs.opendaylight.org/show_bug.cgi?id=6575

Summary:

l2switch does not work well when mininet is disconnected and connected with no 
time in-between.

Description:

This is kind of old issue, since the He->Li migration the l2switch has 
experienced random issues in the system test:

https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-boron/

Same test passes fine in Beryllium as you can see below:

https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-beryllium/

The last discovery (just before Boron release) was that giving more time 
between stop mininet and start mininet made the suite pass.

Criticality:

Although this was a clear regression in l2switch test (Be->B), this bug was not 
initially marked as blocker because it was not trivial to reproduce (e.g. 
switch connection flap).

Risk of not fixing:

l2switch and other similar applications relying on ofplugin may not work well 
when switch connection flaps.


2) https://bugs.opendaylight.org/show_bug.cgi?id=6917

Summary:

Flow matching function (operational flow reconciliation) is not stable.

Description:

I discovered this issue doing some random flow push test in my laptop using 
POSTMAN: adding and deleting the same flow few times produced an alien ID in 
the operational flow.
After that I have created a test that does exactly that: add flow, verify 
operational ID, delete flow, sleep 5s, repeat. With these simple steps the 
issue shows consistently for Boron (new test):

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-1node-flow-services-only-boron/758/archives/log.html.gz

But not in Beryllium:

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-1node-flow-services-only-beryllium/1854/archives/log.html.gz

Criticality:

Besides the test regression, I think there are applications in ODL relying on 
operational flow ID that would be negatively impacted by this bug.

Risk of not fixing:

OF applications relying on operational flow ID (e.g. to confirm flows) can 
sporadically fail.


___
L2switch-dev mailing list
L2switch-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/l2switch-dev

___
L2switch-dev mailing list
L2switch-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/l2switch-dev


Re: [L2switch-dev] [openflowplugin-dev] Blocker bug analysis

2016-10-14 Thread Tomáš Slušný
Also, 6710 was already merged only in master branch, but not yet in boron. It 
have cherry-pick ready here: https://git.opendaylight.org/gerrit/#/c/46761/, 
but until it is merged, I think it would be better to check for improvement in 
Jenkins test for Carbon here: 
https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-carbon/


Od: Tomáš Slušný 
Odoslané: 14. októbra 2016 22:13
Komu: Luis Gomez; Abhijit Kumbhare
Kópia: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
Predmet: Re: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis


​6710 was related to both cluster and single node, because even in single node 
scenario, we are registering to ClusterSingletonService, what had problems when 
we tried to re-register, when we do not fully closed previous registration (so, 
when we disconnected and reconnected too fast). Most part of this fix was done 
in bug 6710, but part of it had to be done also in openflowplugin side in this 
gerrit patch: https://git.opendaylight.org/gerrit/#/c/46390/​. So, Luis, is it 
possible to run failing l2switch test on that patch and see, if it solves 
anything, or not?



Od: Luis Gomez 
Odoslané: 14. októbra 2016 21:49
Komu: Abhijit Kumbhare
Kópia: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
Predmet: Re: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis

Well I think if openflowplugin people cannot figure out where the issue is, the 
next would be to get some debug and further analysis from l2switch people 
explaining the application miss-behavior.

If there are no people available in l2switch to do this, we can lower the 
priority to major or critical.

BR/Luis


On Oct 14, 2016, at 12:35 PM, Luis Gomez 
> wrote:

As far as I can tell nothing we have done so far has helped l2switch test, it 
is still failing.

BR/Luis

On Oct 14, 2016, at 9:53 AM, Abhijit Kumbhare 
> wrote:

About the first one - https://bugs.opendaylight.org/show_bug.cgi?id=6575:

Tomas has said in yesterday's meeting that this was dependent on 
https://bugs.opendaylight.org/show_bug.cgi?id=6710. And that it was fixed. Can 
we retest? Also a question for Tomas: 6710 is cluster related - while 6575 is 
not clear if it is cluster or single node. Are we sure 6575 is fixed?

The second one: https://bugs.opendaylight.org/show_bug.cgi?id=6917 - Shuva will 
respond about that some time.



On Thu, Oct 13, 2016 at 9:50 PM, Luis Gomez 
> wrote:
Hi all,

We are getting close to Boron SR1 so I think it makes sense to review the 2 
blocking issues we have:


1) https://bugs.opendaylight.org/show_bug.cgi?id=6575

Summary:

l2switch does not work well when mininet is disconnected and connected with no 
time in-between.

Description:

This is kind of old issue, since the He->Li migration the l2switch has 
experienced random issues in the system test:

https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-boron/

Same test passes fine in Beryllium as you can see below:

https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-beryllium/

The last discovery (just before Boron release) was that giving more time 
between stop mininet and start mininet made the suite pass.

Criticality:

Although this was a clear regression in l2switch test (Be->B), this bug was not 
initially marked as blocker because it was not trivial to reproduce (e.g. 
switch connection flap).

Risk of not fixing:

l2switch and other similar applications relying on ofplugin may not work well 
when switch connection flaps.


2) https://bugs.opendaylight.org/show_bug.cgi?id=6917

Summary:

Flow matching function (operational flow reconciliation) is not stable.

Description:

I discovered this issue doing some random flow push test in my laptop using 
POSTMAN: adding and deleting the same flow few times produced an alien ID in 
the operational flow.
After that I have created a test that does exactly that: add flow, verify 
operational ID, delete flow, sleep 5s, repeat. With these simple steps the 
issue shows consistently for Boron (new test):

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-1node-flow-services-only-boron/758/archives/log.html.gz

But not in Beryllium:

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-1node-flow-services-only-beryllium/1854/archives/log.html.gz

Criticality:

Besides the test regression, I think there are applications in ODL relying on 
operational flow ID that would be negatively impacted by this bug.

Risk of not fixing:

OF applications relying on operational flow ID (e.g. to confirm flows) can 
sporadically fail.


___
L2switch-dev mailing list

Re: [L2switch-dev] [openflowplugin-dev] Blocker bug analysis

2016-10-14 Thread Tomáš Slušný
​6710 was related to both cluster and single node, because even in single node 
scenario, we are registering to ClusterSingletonService, what had problems when 
we tried to re-register, when we do not fully closed previous registration (so, 
when we disconnected and reconnected too fast). Most part of this fix was done 
in bug 6710, but part of it had to be done also in openflowplugin side in this 
gerrit patch: https://git.opendaylight.org/gerrit/#/c/46390/​. So, Luis, is it 
possible to run failing l2switch test on that patch and see, if it solves 
anything, or not?



Od: Luis Gomez 
Odoslané: 14. októbra 2016 21:49
Komu: Abhijit Kumbhare
Kópia: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
Predmet: Re: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis

Well I think if openflowplugin people cannot figure out where the issue is, the 
next would be to get some debug and further analysis from l2switch people 
explaining the application miss-behavior.

If there are no people available in l2switch to do this, we can lower the 
priority to major or critical.

BR/Luis


On Oct 14, 2016, at 12:35 PM, Luis Gomez 
> wrote:

As far as I can tell nothing we have done so far has helped l2switch test, it 
is still failing.

BR/Luis

On Oct 14, 2016, at 9:53 AM, Abhijit Kumbhare 
> wrote:

About the first one - https://bugs.opendaylight.org/show_bug.cgi?id=6575:

Tomas has said in yesterday's meeting that this was dependent on 
https://bugs.opendaylight.org/show_bug.cgi?id=6710. And that it was fixed. Can 
we retest? Also a question for Tomas: 6710 is cluster related - while 6575 is 
not clear if it is cluster or single node. Are we sure 6575 is fixed?

The second one: https://bugs.opendaylight.org/show_bug.cgi?id=6917 - Shuva will 
respond about that some time.



On Thu, Oct 13, 2016 at 9:50 PM, Luis Gomez 
> wrote:
Hi all,

We are getting close to Boron SR1 so I think it makes sense to review the 2 
blocking issues we have:


1) https://bugs.opendaylight.org/show_bug.cgi?id=6575

Summary:

l2switch does not work well when mininet is disconnected and connected with no 
time in-between.

Description:

This is kind of old issue, since the He->Li migration the l2switch has 
experienced random issues in the system test:

https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-boron/

Same test passes fine in Beryllium as you can see below:

https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-beryllium/

The last discovery (just before Boron release) was that giving more time 
between stop mininet and start mininet made the suite pass.

Criticality:

Although this was a clear regression in l2switch test (Be->B), this bug was not 
initially marked as blocker because it was not trivial to reproduce (e.g. 
switch connection flap).

Risk of not fixing:

l2switch and other similar applications relying on ofplugin may not work well 
when switch connection flaps.


2) https://bugs.opendaylight.org/show_bug.cgi?id=6917

Summary:

Flow matching function (operational flow reconciliation) is not stable.

Description:

I discovered this issue doing some random flow push test in my laptop using 
POSTMAN: adding and deleting the same flow few times produced an alien ID in 
the operational flow.
After that I have created a test that does exactly that: add flow, verify 
operational ID, delete flow, sleep 5s, repeat. With these simple steps the 
issue shows consistently for Boron (new test):

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-1node-flow-services-only-boron/758/archives/log.html.gz

But not in Beryllium:

https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-1node-flow-services-only-beryllium/1854/archives/log.html.gz

Criticality:

Besides the test regression, I think there are applications in ODL relying on 
operational flow ID that would be negatively impacted by this bug.

Risk of not fixing:

OF applications relying on operational flow ID (e.g. to confirm flows) can 
sporadically fail.


___
L2switch-dev mailing list
L2switch-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/l2switch-dev



TomášSlušný
Software Developer

Sídlo / Mlynské Nivy 56 / 821 05 Bratislava / Slovakia
R centrum / Janka Kráľa 9 /  974 01 Banská Bystrica / Slovakia
+421 911 083 902 / tomas.slu...@pantheon.tech
reception: +421 2 206 65 114 / www.pantheon.sk

[logo]


___
L2switch-dev mailing list
L2switch-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/l2switch-dev


Re: [L2switch-dev] [openflowplugin-dev] Blocker bug analysis

2016-10-14 Thread Luis Gomez
No luck :), still fails on "address tracker" and "loop remover":

https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-carbon/154/
https://logs.opendaylight.org/releng/jenkins092/l2switch-csit-1node-switch-only-carbon/154/archives/log.html.gz

BR/Luis


> On Oct 14, 2016, at 2:26 PM, Luis Gomez  wrote:
> 
> OK Tomas, I will try your patch 
> https://git.opendaylight.org/gerrit/#/c/46390/ on 
> l2switch-csit-1node-switch-only-carbon and let you know.
> 
> 
>> On Oct 14, 2016, at 1:23 PM, Tomáš Slušný  wrote:
>> 
>> Also, 6710 was already merged only in master branch, but not yet in boron. 
>> It have cherry-pick ready here: 
>> https://git.opendaylight.org/gerrit/#/c/46761/, but until it is merged, I 
>> think it would be better to check for improvement in Jenkins test for Carbon 
>> here: 
>> https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-carbon/
>> Od: Tomáš Slušný 
>> Odoslané: 14. októbra 2016 22:13
>> Komu: Luis Gomez; Abhijit Kumbhare
>> Kópia: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
>> Predmet: Re: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis
>> 
>> ​6710 was related to both cluster and single node, because even in single 
>> node scenario, we are registering to ClusterSingletonService, what had 
>> problems when we tried to re-register, when we do not fully closed previous 
>> registration (so, when we disconnected and reconnected too fast). Most part 
>> of this fix was done in bug 6710, but part of it had to be done also in 
>> openflowplugin side in this gerrit patch: 
>> https://git.opendaylight.org/gerrit/#/c/46390/​. So, Luis, is it possible to 
>> run failing l2switch test on that patch and see, if it solves anything, or 
>> not?
>> 
>> Od: Luis Gomez 
>> Odoslané: 14. októbra 2016 21:49
>> Komu: Abhijit Kumbhare
>> Kópia: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
>> Predmet: Re: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis
>> 
>> Well I think if openflowplugin people cannot figure out where the issue is, 
>> the next would be to get some debug and further analysis from l2switch 
>> people explaining the application miss-behavior.
>> 
>> If there are no people available in l2switch to do this, we can lower the 
>> priority to major or critical.
>> 
>> BR/Luis
>> 
>> 
>>> On Oct 14, 2016, at 12:35 PM, Luis Gomez  wrote:
>>> 
>>> As far as I can tell nothing we have done so far has helped l2switch test, 
>>> it is still failing.
>>> 
>>> BR/Luis
>>> 
 On Oct 14, 2016, at 9:53 AM, Abhijit Kumbhare  
 wrote:
 
 About the first one - https://bugs.opendaylight.org/show_bug.cgi?id=6575: 
 
 Tomas has said in yesterday's meeting that this was dependent on 
 https://bugs.opendaylight.org/show_bug.cgi?id=6710. And that it was fixed. 
 Can we retest? Also a question for Tomas: 6710 is cluster related - while 
 6575 is not clear if it is cluster or single node. Are we sure 6575 is 
 fixed?
 
 The second one: https://bugs.opendaylight.org/show_bug.cgi?id=6917 - Shuva 
 will respond about that some time.
 
 
 
 On Thu, Oct 13, 2016 at 9:50 PM, Luis Gomez  wrote:
 Hi all,
 
 We are getting close to Boron SR1 so I think it makes sense to review the 
 2 blocking issues we have:
 
 
 1) https://bugs.opendaylight.org/show_bug.cgi?id=6575
 
 Summary:
 
 l2switch does not work well when mininet is disconnected and connected 
 with no time in-between.
 
 Description:
 
 This is kind of old issue, since the He->Li migration the l2switch has 
 experienced random issues in the system test:
 
 https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-boron/
 
 Same test passes fine in Beryllium as you can see below:
 
 https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-beryllium/
 
 The last discovery (just before Boron release) was that giving more time 
 between stop mininet and start mininet made the suite pass.
 
 Criticality:
 
 Although this was a clear regression in l2switch test (Be->B), this bug 
 was not initially marked as blocker because it was not trivial to 
 reproduce (e.g. switch connection flap).
 
 Risk of not fixing:
 
 l2switch and other similar applications relying on ofplugin may not work 
 well when switch connection flaps.
 
 
 2) https://bugs.opendaylight.org/show_bug.cgi?id=6917
 
 Summary:
 
 Flow matching function (operational flow reconciliation) is not stable.
 
 Description:
 
 I discovered this issue doing some random flow push test in my laptop 
 using POSTMAN: adding and deleting 

Re: [L2switch-dev] [openflowplugin-dev] Blocker bug analysis

2016-10-14 Thread Luis Gomez
OK Tomas, I will try your patch https://git.opendaylight.org/gerrit/#/c/46390/ 
on l2switch-csit-1node-switch-only-carbon and let you know.


> On Oct 14, 2016, at 1:23 PM, Tomáš Slušný  wrote:
> 
> Also, 6710 was already merged only in master branch, but not yet in boron. It 
> have cherry-pick ready here: https://git.opendaylight.org/gerrit/#/c/46761/, 
> but until it is merged, I think it would be better to check for improvement 
> in Jenkins test for Carbon here: 
> https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-carbon/
> Od: Tomáš Slušný 
> Odoslané: 14. októbra 2016 22:13
> Komu: Luis Gomez; Abhijit Kumbhare
> Kópia: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
> Predmet: Re: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis
>  
> ​6710 was related to both cluster and single node, because even in single 
> node scenario, we are registering to ClusterSingletonService, what had 
> problems when we tried to re-register, when we do not fully closed previous 
> registration (so, when we disconnected and reconnected too fast). Most part 
> of this fix was done in bug 6710, but part of it had to be done also in 
> openflowplugin side in this gerrit patch: 
> https://git.opendaylight.org/gerrit/#/c/46390/​. So, Luis, is it possible to 
> run failing l2switch test on that patch and see, if it solves anything, or 
> not?
> 
> Od: Luis Gomez 
> Odoslané: 14. októbra 2016 21:49
> Komu: Abhijit Kumbhare
> Kópia: An Ho; openflowplugin-dev; OpenDayLight-L2switch-Dev
> Predmet: Re: [openflowplugin-dev] [L2switch-dev] Blocker bug analysis
>  
> Well I think if openflowplugin people cannot figure out where the issue is, 
> the next would be to get some debug and further analysis from l2switch people 
> explaining the application miss-behavior.
> 
> If there are no people available in l2switch to do this, we can lower the 
> priority to major or critical.
> 
> BR/Luis
> 
> 
>> On Oct 14, 2016, at 12:35 PM, Luis Gomez  wrote:
>> 
>> As far as I can tell nothing we have done so far has helped l2switch test, 
>> it is still failing.
>> 
>> BR/Luis
>> 
>>> On Oct 14, 2016, at 9:53 AM, Abhijit Kumbhare  wrote:
>>> 
>>> About the first one - https://bugs.opendaylight.org/show_bug.cgi?id=6575: 
>>> 
>>> Tomas has said in yesterday's meeting that this was dependent on 
>>> https://bugs.opendaylight.org/show_bug.cgi?id=6710. And that it was fixed. 
>>> Can we retest? Also a question for Tomas: 6710 is cluster related - while 
>>> 6575 is not clear if it is cluster or single node. Are we sure 6575 is 
>>> fixed?
>>> 
>>> The second one: https://bugs.opendaylight.org/show_bug.cgi?id=6917 - Shuva 
>>> will respond about that some time.
>>> 
>>> 
>>> 
>>> On Thu, Oct 13, 2016 at 9:50 PM, Luis Gomez  wrote:
>>> Hi all,
>>> 
>>> We are getting close to Boron SR1 so I think it makes sense to review the 2 
>>> blocking issues we have:
>>> 
>>> 
>>> 1) https://bugs.opendaylight.org/show_bug.cgi?id=6575
>>> 
>>> Summary:
>>> 
>>> l2switch does not work well when mininet is disconnected and connected with 
>>> no time in-between.
>>> 
>>> Description:
>>> 
>>> This is kind of old issue, since the He->Li migration the l2switch has 
>>> experienced random issues in the system test:
>>> 
>>> https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-boron/
>>> 
>>> Same test passes fine in Beryllium as you can see below:
>>> 
>>> https://jenkins.opendaylight.org/releng/view/l2switch/job/l2switch-csit-1node-switch-only-beryllium/
>>> 
>>> The last discovery (just before Boron release) was that giving more time 
>>> between stop mininet and start mininet made the suite pass.
>>> 
>>> Criticality:
>>> 
>>> Although this was a clear regression in l2switch test (Be->B), this bug was 
>>> not initially marked as blocker because it was not trivial to reproduce 
>>> (e.g. switch connection flap).
>>> 
>>> Risk of not fixing:
>>> 
>>> l2switch and other similar applications relying on ofplugin may not work 
>>> well when switch connection flaps.
>>> 
>>> 
>>> 2) https://bugs.opendaylight.org/show_bug.cgi?id=6917
>>> 
>>> Summary:
>>> 
>>> Flow matching function (operational flow reconciliation) is not stable.
>>> 
>>> Description:
>>> 
>>> I discovered this issue doing some random flow push test in my laptop using 
>>> POSTMAN: adding and deleting the same flow few times produced an alien ID 
>>> in the operational flow.
>>> After that I have created a test that does exactly that: add flow, verify 
>>> operational ID, delete flow, sleep 5s, repeat. With these simple steps the 
>>> issue shows consistently for Boron (new test):
>>> 
>>> https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-1node-flow-services-only-boron/758/archives/log.html.gz
>>> 
>>> But not in Beryllium:
>>> 
>>>