please see inline...
On 08/31/2016 11:30 PM, Luis Gomez wrote:
Hi Jamo,
Thanks for the analysis, as I commented in private to some openflow committers
the openflowplugin main feature (flow
services) is "not experimental" in single node:
https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-flow-services-only-boron/
However the same feature is "experimental" when run in cluster environment:
_https://jenkins.opendaylight.org/releng/view/CSIT-3node/job/openflowplugin-csit-3node-clustering-only-boron/_
My guess is that most of the cluster instabilities are due to blocker bug:
https://bugs.opendaylight.org/show_bug.cgi?id=6554
So if we solve the above in the coming days there are chances for the openflow cluster to
be also "not experimental".
For your comments see in-line:
On Aug 31, 2016, at 10:51 PM, Jamo Luhrsen <[email protected]
<mailto:[email protected]>> wrote:
For the OpenflowPlugin release review Thursday morning, I have the following
analysis of their
upstream CSIT for boron, using the most recent boron job result.
please note that I do not know if all delivered features have system test or
not, so
only reporting on what exists... which is a LOT!
It's hard to know what's really happening here. I think the main functionality suite
"flow-services"
is passing 100% and probably gives some confidence. But with the other suites
having what looks
like basic issues, I am a bit worried. So, just reporting for now. I have
some extra details
below the job listing.
NOT-OK 3node-periodic-bulkomatic-clustering-daily-only-boron
(unexpected failures)
NOT-OK 3node-periodic-bulkomatic-clustering-daily-helium-redesign-only-boron
(unexpected failures)
NOT-OK 3node-clustering-only-boron
(unexpected failures)
NOT-OK 3node-clustering-helium-redesign-only-boron
(unexpected failures)
NOT-OK 1node-scalability-helium-redesign-only-boron
(unexpected failures)
NOT-OK 1node-periodic-scale-stats-collection-daily-helium-redesign-only-boron
(unexpected failures)
NOT-OK 1node-periodic-scale-stats-collection-daily-frs-only-boron
(unexpected failures)
NOT-OK 1node-periodic-scalability-daily-helium-redesign-only-boron (scale
test found zero)
NOT-OK 1node-periodic-longevity-only-boron
(unexpected failures)
NOT-OK 1node-periodic-longevity-helium-redesign-only-boron
(unexpected failures)
NOT-OK 1node-periodic-link-scalability-daily-helium-redesign-only-boron
(scale test found zero)
NOT-OK 1node-flow-services-helium-redesign-only-boron
(unexpected failures)
NOT-OK 1node-flow-services-frs-only-boron (unexpected
failures)
OK 1node-scalability-only-boron
OK 1node-periodic-sw-scalability-daily-only-boron (scaled
to 500 switches)
OK 1node-periodic-sw-scalability-daily-helium-redesign-only-boron(scaled
to 500 switches)
OK 1node-periodic-scale-stats-collection-daily-only-boron
OK 1node-periodic-rpc-time-measure-daily-only-boron
OK 1node-periodic-rpc-time-measure-daily-helium-redesign-only-boron
OK 1node-periodic-link-scalability-daily-only-boron (??scaling to
~2500 links)
OK 1node-periodic-cbench-daily-only-boron
(critical bug found here)
OK 1node-periodic-cbench-daily-helium-redesign-only-boron (perf
test found zero)
OK 1node-periodic-bulkomatic-perf-daily-only-boron
OK 1node-periodic-bulk-matic-ds-daily-only-boron
OK 1node-periodic-bulk-matic-ds-daily-helium-redesign-only-boron
OK 1node-flow-services-only-boron
OK 1node-flow-services-all-boron
OK 1node-config-performance-only-boron
OK 1node-config-performance-helium-redesign-only-boron
OK 1node-cbench-performance-only-boron (critical
bug found here)
OK 1node-cbench-performance-helium-redesign-only-boron (perf
test found zero)
Some failures I saw actually pointed clearly to a bug, but the bug was in
resolved state so
that means it's a new type of failure, or we have a regression.
Can you tell where do you see this?
here's one:
https://logs.opendaylight.org/releng/jenkins092/openflowplugin-csit-3node-periodic-bulkomatic-clustering-daily-only-boron/77/archives/log.html.gz#s1-s1-t25
it points to bug 6058 but it's marked RESOLVED.
not sure if there are others, as I didn't always look at every suite's failures
if I noticed just
one that was not meeting expectations.
Some failures might be in perf/scale related tests and we may decide that those
failures
are ok because they are relative levels we are testing against. However, some
of the failures
I saw in performance jobs looked to be fundamental (e.g. zero flows found)
Is this the Cbench throughput test? if so we have already a critical bug.
cbench still was showing some numbers in it's plot for throughput I think, but
they were just
really low. But, here is a good example of what I saw:
https://jenkins.opendaylight.org/releng/user/jluhrsen/my-views/view/ofp%20boron%20csit/job/openflowplugin-csit-1node-periodic-scalability-daily-helium-redesign-only-boron/plot/Inventory%20Scalability/
it is for the helium-redesign, so maybe we don't care any more?
There were failures in longevity tests that were also worrisome because of how
short the job
ran before failing. Seems something basic is breaking there. The default
plugin longevity
job has a thread count graph that was up and to the right over time and made me
think about
a thread leak. The helium plugin never even saw the first group of connected
switches
and failed straight away.
The first could be a critical bug but we never got that far fixing more
fundamental issues to pay attention to this. The
second is because helium plugin LLDP speaker does not work in boron and
probably will not be fixed because this feature is
not shipped as default plugin.
maybe it's worth getting a blocker bug against the longevity troubles? I don't
think the
test is very stressful, and if it's failing after a short time maybe we have a
serious
problem that we do not want to release with?
for the helium-redesign LLDP speaker thing, I think that explains the scale
test result
above. It did only stop working recently though, so probably wouldn't be
impossible to
find where it broke. But, I don't think it's going to make it high enough on
the list
of priorities here.
Thanks,
JamO
The cbench test fails and points to a bug that is non-blocking, but critical.
Per our
expectations this is still ok. I urge openflowplugin to double check if it
should be
upgraded to blocker or not, but I am sure they have scrubbed it more than once
already.
In my opinion this should be a blocker because 1) it is a regression from
Beryllium and 2) anybody testing ODL perf will hit
this issue and whatever perf report will come after will harm ODL.
The -frs-only-boron suite looks like its having major troubles interacting with
a tools
vm. I didn't dig too deep, but that's my first guess.
Thanks,
JamO
_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev