Re: [openflowplugin-dev] [opendaylight-dev] netvirt 3node carbon csit in trouble

Luis Gomez Tue, 21 Mar 2017 22:15:59 -0700

Hi Jamo, I can confirm the controller patch introduced the regression,

after building the revert:


https://git.opendaylight.org/gerrit/#/c/53643/

things go back to normal in cluster test:

https://logs.opendaylight.org/sandbox/jenkins091/openflowplugin-csit-3node-clustering-only-carbon/4/archives/log.html.gz

BR/Luis


> On Mar 21, 2017, at 3:22 PM, Luis Gomez <[email protected]> wrote:
> 
> Right, something really broke the ofp cluster in carbon between Mar 19th 
> 7:22AM UTC and Mar 20th 10:53AM UTC. The patch you point out is in that 
> interval.
> 
> It seems the controller cluster test in carbon is far from stable so 
> difficult to tell when the regression was introduced by looking at it:
> 
> https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/controller-csit-3node-clustering-only-carbon/
>  
> <https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/controller-csit-3node-clustering-only-carbon/>
> 
> Finally, how does controller people verify patches? I do not see any patch 
> test job like we have in other projects.
> 
> BR/Luis
> 
>> On Mar 21, 2017, at 2:15 PM, Jamo Luhrsen <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> +openflowplugin and controller teams
>> 
>> TL;DR
>> 
>> I think this controller patch caused some breakages in our 3node CSIT.
>> 
>> https://git.opendaylight.org/gerrit/#/c/49265/ 
>> <https://git.opendaylight.org/gerrit/#/c/49265/>
>> 
>> 
>> both functionality of the controller as well as giving us a ton more
>> logs which creates other problems.
>> 
>> I think 3node ofp csit is broken too:
>> 
>> https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-3node-clustering-only-carbon/
>>  
>> <https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-3node-clustering-only-carbon/>
>> 
>> I ran some csit tests in the sandbox, (jobs 1-4) here:
>> 
>> https://jenkins.opendaylight.org/sandbox/job/netvirt-csit-3node-openstack-newton-nodl-v2-jamo-upstream-transparent-carbon/
>>  
>> <https://jenkins.opendaylight.org/sandbox/job/netvirt-csit-3node-openstack-newton-nodl-v2-jamo-upstream-transparent-carbon/>
>> 
>> 
>> you can see job 1 is yellow, and the rest are 100% pass. They are using
>> distros from nexus as they were published from *4500.zip down to *4997.zip
>> 
>> the only difference between 4500 and 4499 is that controller patch above.
>> 
>> Of course something in our env/csit could have changed too, but the karaf
>> logs are definitely bigger in netvirt csit. We collect just expections in
>> a single file and it's ~30x more in a failed job.
>> 
>> Thanks,
>> JamO
>> 
>> On 03/21/2017 01:49 PM, Jamo Luhrsen wrote:
>>> current theory is our karaf.log is getting a lot more messages now. I found 
>>> one
>>> job that didn't get aborted. It did run for 5h33m though:
>>> 
>>> https://jenkins.opendaylight.org/releng/view/netvirt-csit/job/netvirt-csit-3node-openstack-newton-nodl-v2-upstream-transparent-carbon/376/
>>>  
>>> <https://jenkins.opendaylight.org/releng/view/netvirt-csit/job/netvirt-csit-3node-openstack-newton-nodl-v2-upstream-transparent-carbon/376/>
>>> 
>>> the robot logs didn't get created because the generated output.xml was too 
>>> big the
>>> tool to make the .html reports failed or quit. Locally, I could create the 
>>> .html
>>> with that output.xml
>>> 
>>> We have this trouble before where all of a sudden lots more logging comes 
>>> in and
>>> it breaks our jobs.
>>> 
>>> still getting to the bottom of it...
>>> 
>>> JamO
>>> 
>>> On 03/21/2017 10:39 AM, Jamo Luhrsen wrote:
>>>> Netvirt, Integration,
>>>> 
>>>> we need to figure out and fix what's wrong with the netvirt 3node carbon 
>>>> csit.
>>>> 
>>>> the jobs are timing out at our jenkins 6h limit. that means we don't
>>>> get any logs either.
>>>> 
>>>> This will likely cause a large backlog in our jenkins queue.
>>>> 
>>>> If anyone has cycles at the moment to help, catch me on IRC.
>>>> 
>>>> Initially, with Alon's help, we know that this job [0] was not seeing
>>>> this trouble. This job [1].
>>>> 
>>>> the difference in ODL patches between the two distros that were used
>>>> have some controller patches that seem cluster related. here are all
>>>> the patches that came in between the two:
>>>> 
>>>> controller   https://git.opendaylight.org/gerrit/49265     BUG-5280: add 
>>>> frontend state lifecycle
>>>> controller   https://git.opendaylight.org/gerrit/49738     BUG-2138: Use 
>>>> correct actor context in shard lookup.
>>>> controller   https://git.opendaylight.org/gerrit/49663     BUG-2138: Fix 
>>>> shard registration with ProxyProducers.
>>>> 
>>>> From the looks of the console log (all we have) it seems that each
>>>> test case is just taking a long time. I don't know more than that
>>>> at the moment.
>>>> 
>>>> JamO
>>>> 
>>>> 
>>>> 
>>>> [0]
>>>> https://jenkins.opendaylight.org/releng/view/netvirt-csit/job/netvirt-csit-3node-openstack-newton-nodl-v2-upstream-transparent-carbon/373/
>>>> [1]
>>>> https://jenkins.opendaylight.org/releng/view/netvirt-csit/job/netvirt-csit-3node-openstack-newton-nodl-v2-upstream-transparent-carbon/374/
>>>> 
>> _______________________________________________
>> dev mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.opendaylight.org/mailman/listinfo/dev 
>> <https://lists.opendaylight.org/mailman/listinfo/dev>

_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Re: [openflowplugin-dev] [opendaylight-dev] netvirt 3node carbon csit in trouble

Reply via email to