Hi Jamo, I can confirm the controller patch introduced the regression, after building the revert:
https://git.opendaylight.org/gerrit/#/c/53643/ things go back to normal in cluster test: https://logs.opendaylight.org/sandbox/jenkins091/openflowplugin-csit-3node-clustering-only-carbon/4/archives/log.html.gz BR/Luis > On Mar 21, 2017, at 3:22 PM, Luis Gomez <[email protected]> wrote: > > Right, something really broke the ofp cluster in carbon between Mar 19th > 7:22AM UTC and Mar 20th 10:53AM UTC. The patch you point out is in that > interval. > > It seems the controller cluster test in carbon is far from stable so > difficult to tell when the regression was introduced by looking at it: > > https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/controller-csit-3node-clustering-only-carbon/ > > <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/controller-csit-3node-clustering-only-carbon/> > > Finally, how does controller people verify patches? I do not see any patch > test job like we have in other projects. > > BR/Luis > >> On Mar 21, 2017, at 2:15 PM, Jamo Luhrsen <[email protected] >> <mailto:[email protected]>> wrote: >> >> +openflowplugin and controller teams >> >> TL;DR >> >> I think this controller patch caused some breakages in our 3node CSIT. >> >> https://git.opendaylight.org/gerrit/#/c/49265/ >> <https://git.opendaylight.org/gerrit/#/c/49265/> >> >> >> both functionality of the controller as well as giving us a ton more >> logs which creates other problems. >> >> I think 3node ofp csit is broken too: >> >> https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-3node-clustering-only-carbon/ >> >> <https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-3node-clustering-only-carbon/> >> >> I ran some csit tests in the sandbox, (jobs 1-4) here: >> >> https://jenkins.opendaylight.org/sandbox/job/netvirt-csit-3node-openstack-newton-nodl-v2-jamo-upstream-transparent-carbon/ >> >> <https://jenkins.opendaylight.org/sandbox/job/netvirt-csit-3node-openstack-newton-nodl-v2-jamo-upstream-transparent-carbon/> >> >> >> you can see job 1 is yellow, and the rest are 100% pass. They are using >> distros from nexus as they were published from *4500.zip down to *4997.zip >> >> the only difference between 4500 and 4499 is that controller patch above. >> >> Of course something in our env/csit could have changed too, but the karaf >> logs are definitely bigger in netvirt csit. We collect just expections in >> a single file and it's ~30x more in a failed job. >> >> Thanks, >> JamO >> >> On 03/21/2017 01:49 PM, Jamo Luhrsen wrote: >>> current theory is our karaf.log is getting a lot more messages now. I found >>> one >>> job that didn't get aborted. It did run for 5h33m though: >>> >>> https://jenkins.opendaylight.org/releng/view/netvirt-csit/job/netvirt-csit-3node-openstack-newton-nodl-v2-upstream-transparent-carbon/376/ >>> >>> <https://jenkins.opendaylight.org/releng/view/netvirt-csit/job/netvirt-csit-3node-openstack-newton-nodl-v2-upstream-transparent-carbon/376/> >>> >>> the robot logs didn't get created because the generated output.xml was too >>> big the >>> tool to make the .html reports failed or quit. Locally, I could create the >>> .html >>> with that output.xml >>> >>> We have this trouble before where all of a sudden lots more logging comes >>> in and >>> it breaks our jobs. >>> >>> still getting to the bottom of it... >>> >>> JamO >>> >>> On 03/21/2017 10:39 AM, Jamo Luhrsen wrote: >>>> Netvirt, Integration, >>>> >>>> we need to figure out and fix what's wrong with the netvirt 3node carbon >>>> csit. >>>> >>>> the jobs are timing out at our jenkins 6h limit. that means we don't >>>> get any logs either. >>>> >>>> This will likely cause a large backlog in our jenkins queue. >>>> >>>> If anyone has cycles at the moment to help, catch me on IRC. >>>> >>>> Initially, with Alon's help, we know that this job [0] was not seeing >>>> this trouble. This job [1]. >>>> >>>> the difference in ODL patches between the two distros that were used >>>> have some controller patches that seem cluster related. here are all >>>> the patches that came in between the two: >>>> >>>> controller https://git.opendaylight.org/gerrit/49265 BUG-5280: add >>>> frontend state lifecycle >>>> controller https://git.opendaylight.org/gerrit/49738 BUG-2138: Use >>>> correct actor context in shard lookup. >>>> controller https://git.opendaylight.org/gerrit/49663 BUG-2138: Fix >>>> shard registration with ProxyProducers. >>>> >>>> From the looks of the console log (all we have) it seems that each >>>> test case is just taking a long time. I don't know more than that >>>> at the moment. >>>> >>>> JamO >>>> >>>> >>>> >>>> [0] >>>> https://jenkins.opendaylight.org/releng/view/netvirt-csit/job/netvirt-csit-3node-openstack-newton-nodl-v2-upstream-transparent-carbon/373/ >>>> [1] >>>> https://jenkins.opendaylight.org/releng/view/netvirt-csit/job/netvirt-csit-3node-openstack-newton-nodl-v2-upstream-transparent-carbon/374/ >>>> >> _______________________________________________ >> dev mailing list >> [email protected] <mailto:[email protected]> >> https://lists.opendaylight.org/mailman/listinfo/dev >> <https://lists.opendaylight.org/mailman/listinfo/dev>
_______________________________________________ openflowplugin-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
