You are right Jozef, today I tried hard with no luck, I think this is kind of tricky & inconsistent issue. Also when it fails in CI I do not see anything in the kara logs.
> On Mar 7, 2017, at 9:49 AM, Luis Gomez <[email protected]> wrote: > > I tried a couple of days ago and it reproduced straight forward, I will try > again today and let you know. > >> On Mar 7, 2017, at 12:54 AM, Jozef Bacigál <[email protected] >> <mailto:[email protected]>> wrote: >> >> Abhijit - I don’t know why only in carbon, we are trying to achieve this >> issue locally, still everything is working fine. I will you inform if we get >> something. >> >> Jozef >> >> From: Abhijit Kumbhare [mailto:[email protected] >> <mailto:[email protected]>] >> Sent: Monday, March 6, 2017 7:51 PM >> To: Luis Gomez <[email protected] <mailto:[email protected]>>; >> [email protected] >> <mailto:[email protected]> >> Cc: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) >> <[email protected] <mailto:[email protected]>> >> Subject: Re: [openflowplugin-dev] [integration-dev] Clustering acceptance >> tests >> >> Removed the other lists temporarily: >> >> Luis>> Unfortunately last week we realized of a new issue in carbon: >> https://bugs.opendaylight.org/show_bug.cgi?id=7884 >> <https://bugs.opendaylight.org/show_bug.cgi?id=7884> >> >> Jozef - is this something you have any idea why this bug would only be in >> Carbon not in Boron? >> >> On Sun, Mar 5, 2017 at 7:11 PM, Luis Gomez <[email protected] >> <mailto:[email protected]>> wrote: >> Hi Vratko, some update on OpenFlow cluster issues: >> >> 1) table miss flow only pushed by 1 instance (new bug): >> https://bugs.opendaylight.org/show_bug.cgi?id=7770 >> <https://bugs.opendaylight.org/show_bug.cgi?id=7770> >> >> There is already a candidate fix. >> >> >> 2) restart of device owner in non-HA scenarios does not work (old bug): >> https://bugs.opendaylight.org/show_bug.cgi?id=6459 >> <https://bugs.opendaylight.org/show_bug.cgi?id=6459> >> >> This issue will be addressed in this other bug: >> https://bugs.opendaylight.org/show_bug.cgi?id=7763 >> <https://bugs.opendaylight.org/show_bug.cgi?id=7763> >> >> >> 3) Openflow cluster performance issues (old bug): >> https://bugs.opendaylight.org/show_bug.cgi?id=6755 >> <https://bugs.opendaylight.org/show_bug.cgi?id=6755> >> >> I opened bug to controller project to better understand the log >> ERRORs:https://bugs.opendaylight.org/show_bug.cgi?id=7901 >> <https://bugs.opendaylight.org/show_bug.cgi?id=7901> >> >> Unfortunately last week we realized of a new issue in carbon: >> https://bugs.opendaylight.org/show_bug.cgi?id=7884 >> <https://bugs.opendaylight.org/show_bug.cgi?id=7884> >> >> BR/Luis >> >> >> On Feb 9, 2017, at 5:32 PM, Luis Gomez <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Vratko, >> >> I investigated the issue I commented to you and created a bug for it, >> currently we have these cluster related bugs in OpenFlow identified by the >> system test (there could be more): >> >> 1) table miss flow only pushed by 1 instance (new bug): >> https://bugs.opendaylight.org/show_bug.cgi?id=7770 >> <https://bugs.opendaylight.org/show_bug.cgi?id=7770> >> 2) restart of device owner in non-HA scenarios does not work (old bug): >> https://bugs.opendaylight.org/show_bug.cgi?id=6459 >> <https://bugs.opendaylight.org/show_bug.cgi?id=6459> >> 3) Openflow cluster performance issues (old bug): >> https://bugs.opendaylight.org/show_bug.cgi?id=6755 >> <https://bugs.opendaylight.org/show_bug.cgi?id=6755> >> >> As you said it is unclear whether openflow cluster issues are openflow or >> cluster related, all bugs are now in openflow queue and I would expect >> openflow devs to move to cluster queue if that is where they belong to. >> >> BR/Luis >> >> >> On Feb 7, 2017, at 10:34 AM, Luis Gomez <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> On Feb 7, 2017, at 10:03 AM, Vratko Polak -X (vrpolak - PANTHEON >> TECHNOLOGIES at Cisco) <[email protected] <mailto:[email protected]>> wrote: >> >> Two more questions. >> >> > https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/ >> > >> > <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/> >> > Cluster non HA test >> >> I just realized 1) and 2) are the same job. >> I am not sure which of the six suites [1] >> are you referring to. >> >> Typo, this is the link for non-HA: >> https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-boron/ >> >> <https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-boron/> >> >> >> >> >> but other tests are not, I will have to investigate this. >> > >> > Keep us informed. >> >> Do you have an ETA? >> >> I would say in the next 2 weeks I will have something in place for cluster >> scalability. >> >> >> >> Vratko. >> >> [1] >> https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-clustering-only-carbon/470/archives/log.html.gz >> >> <https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-clustering-only-carbon/470/archives/log.html.gz> >> >> From: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) >> Sent: 7 February, 2017 15:05 >> To: 'Luis Gomez' <[email protected] <mailto:[email protected]>> >> Cc: [email protected] >> <mailto:[email protected]>; >> [email protected] >> <mailto:[email protected]>; openflowplugin-dev >> <[email protected] >> <mailto:[email protected]>> >> Subject: RE: [integration-dev] Clustering acceptance tests >> >> Thanks Luis. >> >> > but other tests are not, I will have to investigate this. >> >> Keep us informed. >> >> > 3) & 4) is probably controller cluster limitation. >> >> Both jobs occasionally pass, >> and I have opened a Bug [0] for exceptions in karaf log. >> To me, it looks like an error in OpenflowPlugin >> (as opposed to Controller) code. >> >> > writing very fast (REST or internal app) on a shard follower DS, and >> > reading on the other follower. >> >> We plan to expand controller-csit-3node-rest-clust-cars-perf-only-carbon, >> not sure yet whether this scenario will be included. >> >> Vratko. >> >> [0] https://bugs.opendaylight.org/show_bug.cgi?id=7750 >> <https://bugs.opendaylight.org/show_bug.cgi?id=7750> >> >> From: Luis Gomez [mailto:[email protected] <mailto:[email protected]>] >> Sent: 7 February, 2017 08:35 >> To: Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) >> <[email protected] <mailto:[email protected]>> >> Cc: [email protected] >> <mailto:[email protected]>; >> [email protected] >> <mailto:[email protected]>; openflowplugin-dev >> <[email protected] >> <mailto:[email protected]>> >> Subject: Re: [integration-dev] Clustering acceptance tests >> >> Here is what I know from OpenFlow plugin (cc-ing ofplugin devs): >> >> * Does your project have a test plan mentioning specific cluster scenarios? >> >> Not written test plan but we are running a bunch of cluster tests. >> >> >> * Do you have any of such scenarios implemented as Robot suites? >> >> 1) >> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/ >> >> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/> >> -> Cluster HA test (DPN connect to all nodes), it used to pass except for >> 1 test (member isolation with iptables), now I see this test is stable but >> other tests are not, I will have to investigate this. >> >> 2) >> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/ >> >> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/> >> -> Cluster non HA test (DPN connect to 1 node), failing because this old >> bug: https://bugs.opendaylight.org/show_bug.cgi?id=6459 >> <https://bugs.opendaylight.org/show_bug.cgi?id=6459>. >> >> 3) >> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-perf-daily-only-boron/ >> >> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-bulkomatic-clustering-perf-daily-only-boron/> >> -> Max flows/sec using bulk-o-matic DS on cluster setup. Not fully working >> because some cluster backend limitation >> https://bugs.opendaylight.org/show_bug.cgi?id=6755 >> <https://bugs.opendaylight.org/show_bug.cgi?id=6755> >> >> 4) >> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-restconf-clustering-perf-daily-only-boron/ >> >> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-periodic-restconf-clustering-perf-daily-only-boron/> >> -> Max flows/sec using NB REST on cluster setup, this never worked very >> good because previous bug. >> >> * Do the robot suites have failures, suspected to be caused by clustering >> (as opposed to application logic, or mistakes in Robot code)? >> >> So far I think issue in 2) is OpenFlow cluster implementation and issue in >> 3) & 4) is probably controller cluster limitation. >> >> >> * Are there open Bugs corresponding to the clustering failures? >> >> Yes, except for 1) that will require some analysis on the unstable tests. >> >> >> * Are you planning to implement more Robot 3node suites until Carbon release? >> >> I will probably replace 1 of the performance suites (no point to run 2 if >> they do not work) by a cluster switch scalability test. >> >> >> * Are there scenarios you would like Controller team to cover using mock >> apps? >> >> I think issue in 3) & 4) could be reproduced in controller project by just >> writing very fast (REST or internal app) on a shard follower DS, and reading >> on the other follower. >> >> On Feb 6, 2017, at 5:31 AM, Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES >> at Cisco) <[email protected] <mailto:[email protected]>> wrote: >> >> Hello Test Contacts. >> >> In Controller project, our highest priority >> for Carbon release is to make sure ODL clustering >> is usable and stable. >> >> We are in the phase of formulating explicit acceptance criteria, >> so we can create execution plan for turning them into Robot suites. >> >> Of course, clustering is not very useful just by itself, >> it is used as a tool applications can use to reach their goals. >> So real acceptance criteria for clustering should also >> take into account whether ODL applications can work in cluster. >> >> Many projects are already running their 3node CSIT tests, >> but on one hand, some important scenarios might be not covered yet, >> and some suites might be too unstable to serve as acceptance tests. >> >> Controller team is small and busy, so we are asking for help. >> Here is a set of quick questions for test contacts: >> * Does your project have a test plan mentioning specific cluster scenarios? >> * Do you have any of such scenarios implemented as Robot suites? >> * Do the robot suites have failures, suspected to be caused by clustering >> (as opposed to application logic, or mistakes in Robot code)? >> * Are there open Bugs corresponding to the clustering failures? >> * Are you planning to implement more Robot 3node suites until Carbon release? >> * Are there scenarios you would like Controller team to cover using mock >> apps? >> >> Vratko (as a Controller test contact). >> _______________________________________________ >> integration-dev mailing list >> [email protected] >> <mailto:[email protected]> >> https://lists.opendaylight.org/mailman/listinfo/integration-dev >> <https://lists.opendaylight.org/mailman/listinfo/integration-dev> >> >> >> >> >> _______________________________________________ >> integration-dev mailing list >> [email protected] >> <mailto:[email protected]> >> https://lists.opendaylight.org/mailman/listinfo/integration-dev >> <https://lists.opendaylight.org/mailman/listinfo/integration-dev>
_______________________________________________ openflowplugin-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
