Hey,
AL, Daniel shall we sit together in the lunch break today?
just to give a idea of our problem. The scenario looks like this:
Shared Openstack/ODL controller 4 CPU cores. 3-4 compute nodes each
having 1 OVS dpnode. So we are far away from 400 switches. And in the
time CPU load from Openstack services is rather low but from ODL is very
high. This is the bug we wrote to ODL:
https://bugs.opendaylight.org/show_bug.cgi?id=8186
And this is our corresponding jira issue in OPNFV:
https://jira.opnfv.org/browse/SDNVPN-144
BR Nikolas
On 11.06.2017 11:33, Tim Irnich wrote:
Luis, I have not looked at the test you link below but what we have
observed is that vSwitch connects and disconnects cause a high load on
ODL, and that the processing of heartbeat messages from other, already
connected vSwitches can get delayed due to this, causing a cascade
effect.
/Tim
*From:* Luis Gomez [mailto:[email protected]]
*Sent:* Friday, June 09, 2017 17:07
*To:* Tim Irnich <[email protected]>
*Cc:* MORTON, ALFRED C (AL) <[email protected]>; Daniel Farrell
<[email protected]>; [email protected];
[email protected]; Nikolas Hermanns
<[email protected]>
*Subject:* Re: [openflowplugin-dev] [integration-dev] Many OVS
connects/disconnects causing high load, disconnects, failure
I guess we are talking about OpenFlow connections to OVS switches
here. If so what is is the high load scenario the controller is in
this test? For single controller (4 CPUs) we support up to 400
switches loaded with 10K flows in Carbon:
https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-sw-scalability-daily-only-carbon/plot/Switch%20Scalability/
Soon I will bring a test to see if this number holds in a cluster.
BR/Luis
On Jun 9, 2017, at 5:05 AM, Tim Irnich <[email protected]
<mailto:[email protected]>> wrote:
Al I think you’re hitting the nail on the head here. We were
thinking the same, giving heartbeat messages priority over other
message processing should prevent this cascading effect we have
seen. Not sure if the current framework allows this though…
Regards, Tim
*From:*MORTON, ALFRED C (AL) [mailto:[email protected]]
*Sent:*Friday, June 09, 2017 14:02
*To:*Daniel Farrell <[email protected]
<mailto:[email protected]>>;
[email protected]
<mailto:[email protected]>;
[email protected]
<mailto:[email protected]>
*Cc:*Tim Irnich <[email protected]
<mailto:[email protected]>>; Nikolas Hermanns
<[email protected] <mailto:[email protected]>>
*Subject:*RE: [integration-dev] Many OVS connects/disconnects
causing high load, disconnects, failure
Hi Daniel,
(I’m not impersonating Luis or Jamo, but measuring lost
southbound packets has been one of my pet projects, as you know...)
If we add the Latte golang tool that Nikkos contributed
to the OPNFV Cperf project, we should be able to
correlate ODL load levels (cbench) with OVS message loss ratios
(and the OVS disconnects, I’m assuming that heartbeats
have the same priority as PACKETINs for ODL processing,
and maybe prioritization is part of the solution...)
Definitely worth discussing further in Beijing next week,
Al
*From:*[email protected]
<mailto:[email protected]>[mailto:[email protected]]*On
Behalf Of*Daniel Farrell
*Sent:*Friday, June 09, 2017 6:51 AM
*To:*[email protected]
<mailto:[email protected]>;[email protected]
<mailto:[email protected]>
*Cc:*Tim Irnich; Nikolas Hermanns
*Subject:*[integration-dev] Many OVS connects/disconnects causing
high load, disconnects, failure
Hey Integration/Test, openflowplugin,
OPNFV vswitch perf folks are reporting ODL problems caused by lots
of OVS disconnects. See the description below.
@Luis, Jamo - What's the most relevant ODL test?
@Others - Can we fix this?
Thanks,
Daniel Farrell
On Fri, Jun 9, 2017 at 6:19 AM Nikolas Hermanns
<[email protected]
<mailto:[email protected]>> wrote:
Hey Daniel,
I hope you will come to the opnfv summit next week :-D. I
would like to discuss with you a new addition to vsperf may
be. We have an issue that through lots of connects and
disconnects of ovs, odl is going into to high load and through
that the heart beats from ovs do not reach odl anymore. Then
even more switches do disconnect and finally the whole cluster
does not have networking anymore.
There are some workarounds for that but basically we would
like to setup a test cases testing the amount of switches odl
can easily handle. Not sure yet something like that.
Can we have a small chat about it next week.
Reach out to me:
+491729607904 <tel:+49%20172%209607904>(whatsapp + sms)
[email protected]
<mailto:[email protected]>(sometimes faster + hangouts)
Or just this mail address.
BR Nikolas
_______________________________________________
openflowplugin-dev mailing list
[email protected]
<mailto:[email protected]>
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev