Re: [openflowplugin-dev] OpenflowPlugin CSIT analysis for release Boron review (9/1)

Luis Gomez Fri, 02 Sep 2016 11:33:03 -0700

Thanks Jamo, answer ion line:

> On Sep 2, 2016, at 10:51 AM, Jamo Luhrsen <[email protected]> wrote:
> 
> please see inline...
> 
> On 08/31/2016 11:30 PM, Luis Gomez wrote:
>> Hi Jamo,
>> 
>> Thanks for the analysis, as I commented in private to some openflow 
>> committers the openflowplugin main feature (flow
>> services) is "not experimental" in single node:
>> 
>> https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-flow-services-only-boron/
>> 
>> However the same feature is "experimental" when run in cluster environment:
>> 
>> _https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/_
>> 
>> My guess is that most of the cluster instabilities are due to blocker bug:
>> 
>> https://bugs.opendaylight.org/show_bug.cgi?id=6554
>> 
>> So if we solve the above in the coming days there are chances for the 
>> openflow cluster to be also "not experimental".
>> 
>> For your comments see in-line:
>> 
>> 
>>> On Aug 31, 2016, at 10:51 PM, Jamo Luhrsen <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>> wrote:
>>> 
>>> For the OpenflowPlugin release review Thursday morning, I have the 
>>> following analysis of their
>>> upstream CSIT for boron, using the most recent boron job result.
>>> 
>>> please note that I do not know if all delivered features have system test 
>>> or not, so
>>> only reporting on what exists...  which is a LOT!
>>> 
>>> It's hard to know what's really happening here.  I think the main 
>>> functionality suite "flow-services"
>>> is passing 100% and probably gives some confidence.  But with the other 
>>> suites having what looks
>>> like basic issues, I am a bit worried.  So, just reporting for now.  I have 
>>> some extra details
>>> below the job listing.
>>> 
>>> NOT-OK  3node-periodic-bulkomatic-clustering-daily-only-boron               
>>>    (unexpected failures)
>>> NOT-OK  
>>> 3node-periodic-bulkomatic-clustering-daily-helium-redesign-only-boron  
>>> (unexpected failures)
>>> NOT-OK  3node-clustering-only-boron                                      
>>> (unexpected failures)
>>> NOT-OK  3node-clustering-helium-redesign-only-boron                      
>>> (unexpected failures)
>>> NOT-OK  1node-scalability-helium-redesign-only-boron                      
>>> (unexpected failures)
>>> NOT-OK  
>>> 1node-periodic-scale-stats-collection-daily-helium-redesign-only-boron 
>>> (unexpected failures)
>>> NOT-OK  1node-periodic-scale-stats-collection-daily-frs-only-boron      
>>> (unexpected failures)
>>> NOT-OK  1node-periodic-scalability-daily-helium-redesign-only-boron      
>>> (scale test found zero)
>>> NOT-OK  1node-periodic-longevity-only-boron                                 
>>>    (unexpected failures)
>>> NOT-OK  1node-periodic-longevity-helium-redesign-only-boron              
>>> (unexpected failures)
>>> NOT-OK  1node-periodic-link-scalability-daily-helium-redesign-only-boron    
>>>    (scale test found zero)
>>> NOT-OK  1node-flow-services-helium-redesign-only-boron                      
>>> (unexpected failures)
>>> NOT-OK  1node-flow-services-frs-only-boron                      (unexpected 
>>> failures)
>>> 
>>> 
>>> OK      1node-scalability-only-boron
>>> OK      1node-periodic-sw-scalability-daily-only-boron                
>>> (scaled to 500 switches)
>>> OK      
>>> 1node-periodic-sw-scalability-daily-helium-redesign-only-boron(scaled to 
>>> 500 switches)
>>> OK      1node-periodic-scale-stats-collection-daily-only-boron
>>> OK      1node-periodic-rpc-time-measure-daily-only-boron
>>> OK      1node-periodic-rpc-time-measure-daily-helium-redesign-only-boron
>>> OK      1node-periodic-link-scalability-daily-only-boron        (??scaling 
>>> to ~2500 links)
>>> OK      1node-periodic-cbench-daily-only-boron                           
>>> (critical bug found here)
>>> OK      1node-periodic-cbench-daily-helium-redesign-only-boron           
>>> (perf test found zero)
>>> OK      1node-periodic-bulkomatic-perf-daily-only-boron
>>> OK      1node-periodic-bulk-matic-ds-daily-only-boron
>>> OK      1node-periodic-bulk-matic-ds-daily-helium-redesign-only-boron
>>> OK      1node-flow-services-only-boron
>>> OK      1node-flow-services-all-boron
>>> OK      1node-config-performance-only-boron
>>> OK      1node-config-performance-helium-redesign-only-boron
>>> OK      1node-cbench-performance-only-boron                        
>>> (critical bug found here)
>>> OK      1node-cbench-performance-helium-redesign-only-boron              
>>> (perf test found zero)
>>> 
>>> Some failures I saw actually pointed clearly to a bug, but the bug was in 
>>> resolved state so
>>> that means it's a new type of failure, or we have a regression.
>> 
>> Can you tell where do you see this?
> 
> 
> here's one:
> https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-boron/77/archives/log.html.gz#s1-s1-t25
>  
> <https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-boron/77/archives/log.html.gz#s1-s1-t25>
> 
> it points to bug 6058 but it's marked RESOLVED.
> 
> not sure if there are others, as I didn't always look at every suite's 
> failures if I noticed just
> one that was not meeting expectations.


Good catch, here is patch to fix this: 
https://git.opendaylight.org/gerrit/#/c/45110/

> 
>>> 
>>> Some failures might be in perf/scale related tests and we may decide that 
>>> those failures
>>> are ok because they are relative levels we are testing against.  However, 
>>> some of the failures
>>> I saw in performance jobs looked to be fundamental (e.g. zero flows found)
>> 
>> Is this the Cbench throughput test? if so we have already a critical bug.
> 
> cbench still was showing some numbers in it's plot for throughput I think, 
> but they were just
> really low.   But, here is a good example of what I saw:
> 
> https://jenkins.opendaylight.org/releng/user/jluhrsen/my-views/view/ofp%20boron%20csit/job/openflowplugin-csit-1node-periodic-scalability-daily-helium-redesign-only-boron/plot/Inventory%20Scalability/
>  
> <https://jenkins.opendaylight.org/releng/user/jluhrsen/my-views/view/ofp%20boron%20csit/job/openflowplugin-csit-1node-periodic-scalability-daily-helium-redesign-only-boron/plot/Inventory%20Scalability/>
> 
> it is for the helium-redesign, so maybe we don't care any more?

We know about a topology discovery issue that was not fixed because lack of 
resources and other priorities. Because of this He plugin test results are not 
relevant anymore and I will remove all jobs after we release.

> 
>>> 
>>> There were failures in longevity tests that were also worrisome because of 
>>> how short the job
>>> ran before failing.  Seems something basic is breaking there.  The default 
>>> plugin longevity
>>> job has a thread count graph that was up and to the right over time and 
>>> made me think about
>>> a thread leak.  The helium plugin never even saw the first group of 
>>> connected switches
>>> and failed straight away.
>> 
>> The first could be a critical bug but we never got that far fixing more 
>> fundamental issues to pay attention to this. The
>> second is because helium plugin LLDP speaker does not work in boron and 
>> probably will not be fixed because this feature is
>> not shipped as default plugin.
> 
> maybe it's worth getting a blocker bug against the longevity troubles?  I 
> don't think the
> test is very stressful, and if it's failing after a short time maybe we have 
> a serious
> problem that we do not want to release with?

Current blockers impact for sure longevity, we can see after they are fixed if 
longevity is blocker or not.

> 
> for the helium-redesign LLDP speaker thing, I think that explains the scale 
> test result
> above.  It did only stop working recently though, so probably wouldn't be 
> impossible to
> find where it broke.  But, I don't think it's going to make it high enough on 
> the list
> of priorities here.
> 
> 
> Thanks,
> JamO
> 
> 
>>> 
>>> The cbench test fails and points to a bug that is non-blocking, but 
>>> critical.  Per our
>>> expectations this is still ok.  I urge openflowplugin to double check if it 
>>> should be
>>> upgraded to blocker or not, but I am sure they have scrubbed it more than 
>>> once already.
>> 
>> In my opinion this should be a blocker because 1) it is a regression from 
>> Beryllium and 2) anybody testing ODL perf will hit
>> this issue and whatever perf report will come after will harm ODL.
>> 
>>> 
>>> The -frs-only-boron suite looks like its having major troubles interacting 
>>> with a tools
>>> vm.  I didn't dig too deep, but that's my first guess.
>>> 
>>> 
>>> Thanks,
>>> JamO

_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Re: [openflowplugin-dev] OpenflowPlugin CSIT analysis for release Boron review (9/1)

Reply via email to