Abhijit - I don’t know why only in carbon, we are trying to achieve this issue 
locally, still everything is working fine. I will you inform if we get 
something.

Jozef

From: Abhijit Kumbhare [mailto:[email protected]]
Sent: Monday, March 6, 2017 7:51 PM
To: Luis Gomez <[email protected]>; [email protected]
Cc: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) 
<[email protected]>
Subject: Re: [openflowplugin-dev] [integration-dev] Clustering acceptance tests

Removed the other lists temporarily:

Luis>> Unfortunately last week we realized of a new issue in carbon: 
https://bugs.opendaylight.org/show_bug.cgi?id=7884

Jozef - is this something you have any idea why this bug would only be in 
Carbon not in Boron?

On Sun, Mar 5, 2017 at 7:11 PM, Luis Gomez 
<[email protected]<mailto:[email protected]>> wrote:
Hi Vratko, some update on OpenFlow cluster issues:

1) table miss flow only pushed by 1 instance (new bug): 
https://bugs.opendaylight.org/show_bug.cgi?id=7770

There is already a candidate fix.


2) restart of device owner in non-HA scenarios does not work (old bug): 
https://bugs.opendaylight.org/show_bug.cgi?id=6459

This issue will be addressed in this other bug: 
https://bugs.opendaylight.org/show_bug.cgi?id=7763


3) Openflow cluster performance issues (old bug): 
https://bugs.opendaylight.org/show_bug.cgi?id=6755

I opened bug to controller project to better understand the log ERRORs: 
https://bugs.opendaylight.org/show_bug.cgi?id=7901

Unfortunately last week we realized of a new issue in carbon: 
https://bugs.opendaylight.org/show_bug.cgi?id=7884

BR/Luis


On Feb 9, 2017, at 5:32 PM, Luis Gomez 
<[email protected]<mailto:[email protected]>> wrote:

Hi Vratko,

I investigated the issue I commented to you and created a bug for it, currently 
we have these cluster related bugs in OpenFlow identified by the system test 
(there could be more):

1) table miss flow only pushed by 1 instance (new bug): 
https://bugs.opendaylight.org/show_bug.cgi?id=7770
2) restart of device owner in non-HA scenarios does not work (old bug): 
https://bugs.opendaylight.org/show_bug.cgi?id=6459
3) Openflow cluster performance issues (old bug): 
https://bugs.opendaylight.org/show_bug.cgi?id=6755

As you said it is unclear whether openflow cluster issues are openflow or 
cluster related, all bugs are now in openflow queue and I would expect openflow 
devs to move to cluster queue if that is where they belong to.

BR/Luis


On Feb 7, 2017, at 10:34 AM, Luis Gomez 
<[email protected]<mailto:[email protected]>> wrote:


On Feb 7, 2017, at 10:03 AM, Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES 
at Cisco) <[email protected]<mailto:[email protected]>> wrote:

Two more questions.

> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/
> Cluster non HA test

I just realized 1) and 2) are the same job.
I am not sure which of the six suites [1]
are you referring to.

Typo, this is the link for non-HA: 
https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-boron/



>> but other tests are not, I will have to investigate this.
>
> Keep us informed.

Do you have an ETA?

I would say in the next 2 weeks I will have something in place for cluster 
scalability.



Vratko.

[1] 
https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-clustering-only-carbon/470/archives/log.html.gz

From: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco)
Sent: 7 February, 2017 15:05
To: 'Luis Gomez' <[email protected]<mailto:[email protected]>>
Cc: 
[email protected]<mailto:[email protected]>;
 
[email protected]<mailto:[email protected]>;
 openflowplugin-dev 
<[email protected]<mailto:[email protected]>>
Subject: RE: [integration-dev] Clustering acceptance tests

Thanks Luis.

> but other tests are not, I will have to investigate this.

Keep us informed.

> 3) & 4) is probably controller cluster limitation.

Both jobs occasionally pass,
and I have opened a Bug [0] for exceptions in karaf log.
To me, it looks like an error in OpenflowPlugin
(as opposed to Controller) code.

> writing very fast (REST or internal app) on a shard follower DS, and reading 
> on the other follower.

We plan to expand controller-csit-3node-rest-clust-cars-perf-only-carbon,
not sure yet whether this scenario will be included.

Vratko.

[0] https://bugs.opendaylight.org/show_bug.cgi?id=7750

From: Luis Gomez [mailto:[email protected]]
Sent: 7 February, 2017 08:35
To: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) 
<[email protected]<mailto:[email protected]>>
Cc: 
[email protected]<mailto:[email protected]>;
 
[email protected]<mailto:[email protected]>;
 openflowplugin-dev 
<[email protected]<mailto:[email protected]>>
Subject: Re: [integration-dev] Clustering acceptance tests

Here is what I know from OpenFlow plugin (cc-ing ofplugin devs):

* Does your project have a test plan mentioning specific cluster scenarios?

Not written test plan but we are running a bunch of cluster tests.

* Do you have any of such scenarios implemented as Robot suites?

1) 
https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/
 ->  Cluster HA test (DPN connect to all nodes), it used to pass except for 1 
test (member isolation with iptables), now I see this test is stable but other 
tests are not, I will have to investigate this.

2) 
https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/
 -> Cluster non HA test (DPN connect to 1 node), failing because this old bug: 
https://bugs.opendaylight.org/show_bug.cgi?id=6459.

3) 
https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-perf-daily-only-boron/
 -> Max flows/sec using bulk-o-matic DS on cluster setup. Not fully working 
because some cluster backend limitation 
https://bugs.opendaylight.org/show_bug.cgi?id=6755

4) 
https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-restconf-clustering-perf-daily-only-boron/
 -> Max flows/sec using NB REST on cluster setup, this never worked very good 
because previous bug.

* Do the robot suites have failures, suspected to be caused by clustering
  (as opposed to application logic, or mistakes in Robot code)?

So far I think issue in 2) is OpenFlow cluster implementation and issue in 3) & 
4) is probably controller cluster limitation.

* Are there open Bugs corresponding to the clustering failures?

Yes, except for 1) that will require some analysis on the unstable tests.

* Are you planning to implement more Robot 3node suites until Carbon release?

I will probably replace 1 of the performance suites (no point to run 2 if they 
do not work) by a cluster switch scalability test.

* Are there scenarios you would like Controller team to cover using mock apps?

I think issue in 3) & 4) could be reproduced in controller project by just 
writing very fast (REST or internal app) on a shard follower DS, and reading on 
the other follower.

On Feb 6, 2017, at 5:31 AM, Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at 
Cisco) <[email protected]<mailto:[email protected]>> wrote:

Hello Test Contacts.

In Controller project, our highest priority
for Carbon release is to make sure ODL clustering
is usable and stable.

We are in the phase of formulating explicit acceptance criteria,
so we can create execution plan for turning them into Robot suites.

Of course, clustering is not very useful just by itself,
it is used as a tool applications can use to reach their goals.
So real acceptance criteria for clustering should also
take into account whether ODL applications can work in cluster.

Many projects are already running their 3node CSIT tests,
but on one hand, some important scenarios might be not covered yet,
and some suites might be too unstable to serve as acceptance tests.

Controller team is small and busy, so we are asking for help.
Here is a set of quick questions for test contacts:
* Does your project have a test plan mentioning specific cluster scenarios?
* Do you have any of such scenarios implemented as Robot suites?
* Do the robot suites have failures, suspected to be caused by clustering
  (as opposed to application logic, or mistakes in Robot code)?
* Are there open Bugs corresponding to the clustering failures?
* Are you planning to implement more Robot 3node suites until Carbon release?
* Are there scenarios you would like Controller team to cover using mock apps?

Vratko (as a Controller test contact).
_______________________________________________
integration-dev mailing list
[email protected]<mailto:[email protected]>
https://lists.opendaylight.org/mailman/listinfo/integration-dev




_______________________________________________
integration-dev mailing list
[email protected]<mailto:[email protected]>
https://lists.opendaylight.org/mailman/listinfo/integration-dev

_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Reply via email to