Re: [openflowplugin-dev] Scalability issues

Shuva Jyoti Kar Thu, 03 Mar 2016 13:53:14 -0800

Thanks Alexis. Just to reconfirm so as to ensure that i understand , all that u 
did was to selectively disable some of the switch features and that helped u to 
increase from 50 to 250, is that correct? Or did u do something else as well?


Thanks
Shuva

On Thu, Mar 03, 2016 at 1:11 PM, Alexis de Talhouët 
<[email protected]<mailto:[email protected]>> wrote:

Hey Muthu,

In the testing mentioned in this thread, yes, statistics collection was enable. 
Although I don’t keep adding some flows.
I connect a switch, then add two flows to this switch. But I do that many 
times, so far I connect 90 containers.
And then I keep disconnect/reconnect switches to make sure it does work fine.

I recently noticed something regarding flow reconciliation, there is an 
intermittent issue that came in stable/lithium after gerrit
d19e54f3f0f9a85e87e70ca3fb97a2cb7a1bab85 [0] A lot of thing came in and I 
wasn’t able to isolate to culprit.
In the scenario above, where I connect/disconnect switches, 95% of the time 
flows are back in the switch, 5% of the time, either no flows or only one.

Scalability wise, so far, I can tell that disabling statistics collection isn’t 
changing anything. What did drastically changed
everything was removed some switch capabilities in 
openflowplugin/openflowplugin-impl/src/main/java/org/opendaylight/openflowplugin/impl/util/DeviceStateUtil.java
 [1].
Although I don’t know if this make sense, I will send a mail to ofp-dev to have 
more input on whether or not it make sense to disable some capabilities. If it 
does,
I’ll submit a patch to enable this configuration.

In my use case I only need port stats, I need them only when the switch 
connects, so only once, so I disabled statistics collection as well.
I can now scale more than 250 switches, I haven’t reach to maximum yet. (ovs 
2.5)

Hope this helps.

Thanks,
Alexis

[0]: https://git.opendaylight.org/gerrit/#/c/33948/
[1]: 
https://github.com/opendaylight/openflowplugin/blob/stable/lithium/openflowplugin-impl/src/main/java/org/opendaylight/openflowplugin/impl/util/DeviceStateUtil.java#l30l36



On Mar 3, 2016, at 3:57 PM, Muthukumaran K 
<[email protected]<mailto:[email protected]>> wrote:

Hi Alexis,

In addition to what Shuva mentioned - Is statistics collection “on” in this 
case since the your scenario keeps adding flows ?

Regards
Muthu


From: 
[email protected]<mailto:[email protected]>
 [mailto:[email protected]] On Behalf Of Alexis 
de Talhouët
Sent: Friday, March 04, 2016 1:43 AM
To: Shuva Jyoti Kar
Cc: [email protected]
Subject: Re: [openflowplugin-dev] Scalability issues

That’s a good question to which I’ll be able to respond in the coming weeks.

I’m going step by step, make the connection, then make sure they persist.
My goal is around 800 switches for one controller.

Thanks,
Alexis
On Mar 3, 2016, at 3:09 PM, Shuva Jyoti Kar <[email protected]> 
wrote:

Thanks Alexis. How stable are the connections if u leave them for a while?

Thanks
shuva
On Thu, Mar 03, 2016 at 12:07 PM, Alexis de Talhouët <[email protected]> 
wrote:

Hello Shuva,

I’m using stable/lithium lithium version.
As of my scenario, it’s a single node, not a cluster. And yes I’m installing 2 
flows per switch.

Thanks,
Alexis
On Mar 3, 2016, at 2:58 PM, Shuva Jyoti Kar <[email protected]> 
wrote:

Hi Alexis,

I understand that you are using the lithium model of the ofplugin, am I 
correct? Also is it in a clustered environment or a single node setup ?
Did you try installing some flows into each of the switches to check how they 
behave.

Thanks
Shuva

Date: Wed, 2 Mar 2016 21:41:09 -0800
From: Jamo Luhrsen <[email protected]>
To: Alexis de Talhou?t <[email protected]>,     Abhijit Kumbhare
                <[email protected]>
Cc: "[email protected]"
                <[email protected]>
Subject: Re: [openflowplugin-dev] Scalability issues
Message-ID: <[email protected]>
Content-Type: text/plain; charset=utf-8



On 02/19/2016 02:10 PM, Alexis de Talhou?t wrote:
> So far my results are:
>
> OVS 2.4.0: ODL configure with 2G of mem ?> max is ~50 switches
> connected OVS 2.3.1: ODL configure with 256MG of mem ?> I currently have 150 
> switches connected, can?t scale more due to infra limits.

Alexis, I think this is probably worth putting a bugzilla up.

How much horsepower do you need per docker ovs instance?  We need to get this 
automated in CSIT.  Marcus from ovsdb wants to do similar tests with ovsdb.

JamO


> I will pursue me testing next week.
>
> Thanks,
> Alexis
>
>> On Feb 19, 2016, at 5:06 PM, Abhijit Kumbhare <[email protected] 
>> <mailto:[email protected]>> wrote:
>>
>> Interesting. I wonder - why that would be?
>>
>> On Fri, Feb 19, 2016 at 1:19 PM, Alexis de Talhou?t <[email protected] 
>> <mailto:[email protected]>> wrote:
>>
>>     OVS 2.3.x scales fine
>>     OVS 2.4.x doesn?t scale well.
>>
>>     Here is also the docker file for ovs 2.4.1
>>
>>
>>
>>>     On Feb 19, 2016, at 11:20 AM, Alexis de Talhou?t 
>>> <[email protected] <mailto:[email protected]>> wrote:
>>>
>>>>     can I use your containers?  do you have any scripts/tools to bring 
>>>> things up/down?
>>>
>>>     Sure, attached a tar file containing all scripts / config / dockerfile 
>>> I?m using to setup docker containers
>>>     emulating OvS.
>>>     FYI: it?s ovs 2.3.0 and not 2.4.0 anymore
>>>
>>>     Also, forget about this whole mail thread, something in my private 
>>> container must be breaking OVS behaviour, I
>>>     don?t know what yet.
>>>
>>>     With the docker file attached here, I can scale 90+ without any 
>>> trouble...
>>>
>>>     Thanks,
>>>     Alexis
>>>
>>>     <ovs_scalability_setup.tar.gz>
>>>
>>>>     On Feb 18, 2016, at 6:07 PM, Jamo Luhrsen <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>>     inline...
>>>>
>>>>     On 02/18/2016 02:58 PM, Alexis de Talhou?t wrote:
>>>>>     I?m running OVS 2.4, against stable/lithium, openflowplugin-li
>>>>
>>>>
>>>>     so this is one difference between CSIT and your setup, in addition to 
>>>> the whole
>>>>     containers vs mininet.
>>>>
>>>>>     I never scaled up to 1k, this was in the CSIT job.
>>>>>     In a real scenario, I scaled to ~400. But it was even before 
>>>>> clustering came into play in ofp lithium.
>>>>>
>>>>>     I think the log I sent have log trace for openflowplugin and 
>>>>> openflowjava, it not the case I could resubmit the
>>>>>     logs.
>>>>>     I removed some of them in openflowjava because it was way to chatty 
>>>>> (logging all messages content between ovs
>>>>>     <---> odl)
>>>>>
>>>>>     Unfortunately those IOException happen after the whole thing blow up. 
>>>>> I was able to narrow done some logs in
>>>>>     openflowjava
>>>>>     to see the first disconnected event. As mentioned in a previous mail 
>>>>> (in this mail thread) it?s the device that is
>>>>>     issuing the disconnect:
>>>>>
>>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | OFFrameDecoder  
>>>>>>                  | 201 -
>>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>> 0.6.4.SNAPSHOT | skipping bytebuf - too few bytes for
>>>>>>     header: 0 < 8
>>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
>>>>>> OFVersionDetector                | 201 -
>>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>> 0.6.4.SNAPSHOT | not enough data
>>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
>>>>>> DelegatingInboundHandler         | 201 -
>>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>> 0.6.4.SNAPSHOT | Channel inactive
>>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
>>>>>> ConnectionAdapterImpl            | 201 -
>>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>> 0.6.4.SNAPSHOT | ConsumeIntern msg on [id: 0x1efab5fb,
>>>>>>     /172.18.0.49:36983 <http://172.18.0.49:36983/> :> 
>>>>>> /192.168.1.159:6633 <http://192.168.1.159:6633/>]
>>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
>>>>>> ConnectionAdapterImpl            | 201 -
>>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>> 0.6.4.SNAPSHOT | ConsumeIntern msg - DisconnectEvent
>>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
>>>>>> ConnectionContextImpl            | 205 -
>>>>>>     org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | 
>>>>>> disconnecting: node=/172.18.0.49:36983|auxId=0|connection
>>>>>>     state = RIP
>>>>>
>>>>>     Those logs come from another run, so are not in the logs I sent 
>>>>> earlier. Although the behaviour is always the same.
>>>>>
>>>>>     Regarding the memory, I don?t want to add more than 2G memory, 
>>>>> because, and I tested it, the more memory I add,
>>>>>     the more
>>>>>     I can scale. But as you pointed out,
>>>>>     this issue is not OOM error. Thus I rather like failing at 2G (less 
>>>>> docker containers to spawn each run ~50).
>>>>
>>>>     so, maybe reduce your memory then to simplify the reproducing steps.  
>>>> Since you know that increasing
>>>>     memory allows you to scale further, but still hit the problem; let's 
>>>> make it easier to hit.  how far
>>>>     can you go with the max mem set to 500M?  if you are only loading 
>>>> ofp-li.
>>>>
>>>>>     I definitely need some help here, because I can?t sort myself out in 
>>>>> the openflowplugin + openflowjava codebase?
>>>>>     But I believe I already have Michal?s attention :)
>>>>
>>>>     can I use your containers?  do you have any scripts/tools to bring 
>>>> things up/down?
>>>>     I might be able to try and reproduce myself.  I like breaking
>>>> things :)
>>>>
>>>>     JamO
>>>>
>>>>
>>>>>
>>>>>     Thanks,
>>>>>     Alexis
>>>>>
>>>>>
>>>>>>     On Feb 18, 2016, at 5:44 PM, Jamo Luhrsen <[email protected]
>>>>>>     <mailto:[email protected]> <mailto:[email protected]>> wrote:
>>>>>>
>>>>>>     Alexis,  don't worry about filing a bug just to give us a common 
>>>>>> place to work/comment, even
>>>>>>     if we close it later because of something outside of ODL.  Email is 
>>>>>> fine too.
>>>>>>
>>>>>>     what ovs version do you have in your containers?  this test sounds 
>>>>>> great.
>>>>>>
>>>>>>     Luis is right, that if you were scaling well past 1k in the past, 
>>>>>> but now it falls over at
>>>>>>     50 it sounds like a bug.
>>>>>>
>>>>>>     Oh, you can try increasing the jvm max_mem from default of 2G just 
>>>>>> as a data point.  The
>>>>>>     fact that you don't get OOMs makes me think memory might not be the 
>>>>>> final bottleneck.
>>>>>>
>>>>>>     you could enable debug/trace logs in the right modules (need ofp 
>>>>>> devs to tell us that)
>>>>>>     for a little more info.
>>>>>>
>>>>>>     I've seen those IOExceptions before and always assumed it was from 
>>>>>> an OF switch doing a
>>>>>>     hard RST on it's connection.
>>>>>>
>>>>>>
>>>>>>     Thanks,
>>>>>>     JamO
>>>>>>
>>>>>>
>>>>>>
>>>>>>     On 02/18/2016 11:48 AM, Luis Gomez wrote:
>>>>>>>     If the same test worked 6-8 months ago this seems like a bug, but 
>>>>>>> please feel free to open it whenever you
>>>>>>>     are sure.
>>>>>>>
>>>>>>>>     On Feb 18, 2016, at 11:45 AM, Alexis de Talhou?t 
>>>>>>>> <[email protected] <mailto:[email protected]>
>>>>>>>>     <mailto:[email protected]> <mailto:[email protected]>> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>     Hello Luis,
>>>>>>>>
>>>>>>>>     For sure I?m willing to open a bug but before I want to make sure 
>>>>>>>> there is a bug and that I?m not doing
>>>>>>>>     something wrong.
>>>>>>>>     In ODL?s infra, there is a test to find the maximum number of 
>>>>>>>> switches that can be connected to ODL, and
>>>>>>>>     this test
>>>>>>>>     reach ~ 500 [0]
>>>>>>>>     I was able to scale up to 1090 switches [1] using the CSIT job in 
>>>>>>>> the sandbox.
>>>>>>>>     I believe the CSIT test is different in a way that switches are 
>>>>>>>> emulated in one mininet VM, whereas I?m
>>>>>>>>     connecting OVS
>>>>>>>>     instances from separate containers.
>>>>>>>>
>>>>>>>>     6-8 months ago, I was able to perform the same test, and scale 
>>>>>>>> with OVS docker container up to ~400 before
>>>>>>>>     ODL start
>>>>>>>>     crashing (with some optimization done behind the scene, i.e. 
>>>>>>>> ulimit, mem, cpu, GC?)
>>>>>>>>     Now I?m not able to scale more than 100 with the same 
>>>>>>>> configuration.
>>>>>>>>
>>>>>>>>     FYI: I just quickly look at the CSIT test [0] karaf.log, it seems 
>>>>>>>> the test is actually failing but it is not
>>>>>>>>     correctly
>>>>>>>>     advertised? switch connection are dropped.
>>>>>>>>     Look for those:
>>>>>>>>     016-02-18 07:07:51,741 | WARN  | entLoopGroup-6-6 | OFFrameDecoder 
>>>>>>>>                   | 181 -
>>>>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>>>> 0.6.4.SNAPSHOT | Unexpected exception from downstream.
>>>>>>>>     java.io.IOException: Connection reset by peer
>>>>>>>>     at sun.nio.ch.FileDispatcherImpl.read0(Native Method)[:1.7.0_85]
>>>>>>>>     at 
>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)[:1.7.0_85]
>>>>>>>>     at 
>>>>>>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)[:1.7.0_85]
>>>>>>>>     at sun.nio.ch.IOUtil.read(IOUtil.java:192)[:1.7.0_85]
>>>>>>>>     at 
>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)[:1.7.0_85]
>>>>>>>>     at
>>>>>>>>     
>>>>>>>> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)[111:io.netty.buffer:4.0.26.Final]
>>>>>>>>     at 
>>>>>>>> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)[111:io.netty.buffer:4.0.26.Final]
>>>>>>>>     at
>>>>>>>>     
>>>>>>>> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>     at
>>>>>>>>     
>>>>>>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>     at
>>>>>>>>     
>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>     at
>>>>>>>>     
>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>     at
>>>>>>>>     
>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>     at 
>>>>>>>> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:349)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>     at
>>>>>>>>     
>>>>>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)[110:io.netty.common:4.0.26.Final]
>>>>>>>>     at
>>>>>>>>     
>>>>>>>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[110:io.netty.common:4.0.26.Final]
>>>>>>>>     at java.lang.Thread.run(Thread.java:745)[:1.7.0_85]
>>>>>>>>
>>>>>>>>
>>>>>>>>     [0]: 
>>>>>>>> https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-scalability-daily-only-stable-lithium/
>>>>>>>>     [1]: https://git.opendaylight.org/gerrit/#/c/33213/
>>>>>>>>
>>>>>>>>>     On Feb 18, 2016, at 2:28 PM, Luis Gomez <[email protected]
>>>>>>>>>     <mailto:[email protected]> <mailto:[email protected]>> wrote:
>>>>>>>>>
>>>>>>>>>     Alexis, thanks very much for sharing this test. Would you mind to 
>>>>>>>>> open a bug with all this info so we can
>>>>>>>>>     track this?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>     On Feb 18, 2016, at 7:29 AM, Alexis de Talhou?t 
>>>>>>>>>> <[email protected]
>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>
>>>>>>>>>>     Hi Michal,
>>>>>>>>>>
>>>>>>>>>>     ODL memory is capped at 2go, the more memory I add, those more 
>>>>>>>>>> OVS I can connect. Regarding CPU, it?s
>>>>>>>>>>     around 10-20%
>>>>>>>>>>     when connecting new OVS, with some peak to 80%.
>>>>>>>>>>
>>>>>>>>>>     After some investigation, here is what I observed:
>>>>>>>>>>     Let say I have 50 switches connected, stat manager disabled. I 
>>>>>>>>>> have one opened socket per switch, plus an
>>>>>>>>>>     additional
>>>>>>>>>>     one for the controller.
>>>>>>>>>>     Then I connect a new switch (2016-02-18 09:35:08,059), 51 
>>>>>>>>>> switches? something is happening causing all
>>>>>>>>>>     connection to
>>>>>>>>>>     be dropped (by device?) and then ODL
>>>>>>>>>>     try to recreate them and goes in a crazy loop where it is never 
>>>>>>>>>> able to re-establish communication, but keeps
>>>>>>>>>>     creating new sockets.
>>>>>>>>>>     I?m suspecting something being garbage collected due to lack of 
>>>>>>>>>> memory, although no OOM errors.
>>>>>>>>>>
>>>>>>>>>>     Attached the YourKit Java Profiler analysis for the described 
>>>>>>>>>> scenario and the logs [1].
>>>>>>>>>>
>>>>>>>>>>     Thanks,
>>>>>>>>>>     Alexis
>>>>>>>>>>
>>>>>>>>>>     [1]:
>>>>>>>>>> https://www.dropbox.com/sh/dgqeqv4j76zwbh3/AACim0za1fUozc7DlY
>>>>>>>>>> J4fsMJa?dl=0
>>>>>>>>>>
>>>>>>>>>>>     On Feb 9, 2016, at 8:59 AM, Michal Rehak -X (mirehak - PANTHEON 
>>>>>>>>>>> TECHNOLOGIES at Cisco) <[email protected]
>>>>>>>>>>>     <mailto:[email protected]>
>>>>>>>>>>>     <mailto:[email protected]>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>     Hi Alexis,
>>>>>>>>>>>     I am not sure how OVS uses threads - in changelog there is some 
>>>>>>>>>>> concurrency related improvement in 2.1.3
>>>>>>>>>>>     and 2.3.
>>>>>>>>>>>     Also I guess docker can be forced regarding assigned resources.
>>>>>>>>>>>
>>>>>>>>>>>     For you the most important is the amount of cores used by 
>>>>>>>>>>> controller.
>>>>>>>>>>>
>>>>>>>>>>>     How does your cpu and memory consumption look like when you 
>>>>>>>>>>> connect all the OVSs?
>>>>>>>>>>>
>>>>>>>>>>>     Regards,
>>>>>>>>>>>     Michal
>>>>>>>>>>>
>>>>>>>>>>>     ________________________________________
>>>>>>>>>>>     From: Alexis de Talhou?t <[email protected]
>>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>>> <mailto:[email protected]>>
>>>>>>>>>>>     Sent: Tuesday, February 9, 2016 14:44
>>>>>>>>>>>     To: Michal Rehak -X (mirehak - PANTHEON TECHNOLOGIES at Cisco)
>>>>>>>>>>>     Cc: [email protected]
>>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>     Subject: Re: [openflowplugin-dev] Scalability issues
>>>>>>>>>>>
>>>>>>>>>>>     Hello Michal,
>>>>>>>>>>>
>>>>>>>>>>>     Yes, all the OvS instances I?m running has a unique DPID.
>>>>>>>>>>>
>>>>>>>>>>>     Regarding the thread limit for netty, I?m running test in a 
>>>>>>>>>>> server that has 28 CPU(s).
>>>>>>>>>>>
>>>>>>>>>>>     Does each OvS instances is assigned its own thread?
>>>>>>>>>>>
>>>>>>>>>>>     Thanks,
>>>>>>>>>>>     Alexis
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>     On Feb 9, 2016, at 3:42 AM, Michal Rehak -X (mirehak - 
>>>>>>>>>>>> PANTHEON TECHNOLOGIES at Cisco)
>>>>>>>>>>>>     <[email protected] <mailto:[email protected]>
>>>>>>>>>>>>     <mailto:[email protected]>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>     Hi Alexis,
>>>>>>>>>>>>     in Li-design there is the stats manager not in form of 
>>>>>>>>>>>> standalone app but as part of core of ofPlugin.
>>>>>>>>>>>>     You can
>>>>>>>>>>>>     disable it via rpc.
>>>>>>>>>>>>
>>>>>>>>>>>>     Just a question regarding your ovs setup. Do you have all 
>>>>>>>>>>>> DPIDs unique?
>>>>>>>>>>>>
>>>>>>>>>>>>     Also there is limit for netty in form of amount of used 
>>>>>>>>>>>> threads. By default it uses 2 x
>>>>>>>>>>>>     cpu_cores_amount. You
>>>>>>>>>>>>     should have as many cores as possible in order to get max 
>>>>>>>>>>>> performance.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>     Regards,
>>>>>>>>>>>>     Michal
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>     ________________________________________
>>>>>>>>>>>>     From: [email protected]
>>>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>     <[email protected]
>>>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>>>> <mailto:[email protected]>>
>>>>>>>>>>>>     on
>>>>>>>>>>>>     behalf of Alexis de Talhou?t <[email protected]
>>>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>>>> <mailto:[email protected]>>
>>>>>>>>>>>>     Sent: Tuesday, February 9, 2016 00:45
>>>>>>>>>>>>     To: [email protected]
>>>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>     Subject: [openflowplugin-dev] Scalability issues
>>>>>>>>>>>>
>>>>>>>>>>>>     Hello openflowplugin-dev,
>>>>>>>>>>>>
>>>>>>>>>>>>     I?m currently running some scalability test against 
>>>>>>>>>>>> openflowplugin-li plugin, stable/lithium.
>>>>>>>>>>>>     Playing with CSIT job, I was able to connect up to 1090
>>>>>>>>>>>>     switches:
>>>>>>>>>>>> https://git.opendaylight.org/gerrit/#/c/33213/
>>>>>>>>>>>>
>>>>>>>>>>>>     I?m now running the test against 40 OvS switches, each one of 
>>>>>>>>>>>> them is in a docker container.
>>>>>>>>>>>>
>>>>>>>>>>>>     Connecting around 30 of them works fine, but then, adding a 
>>>>>>>>>>>> new one break completely ODL, it goes crazy and
>>>>>>>>>>>>     unresponsible.
>>>>>>>>>>>>     Attach a snippet of the karaf.log with log set to DEBUG for 
>>>>>>>>>>>> org.opendaylight.openflowplugin, thus it?s a
>>>>>>>>>>>>     really
>>>>>>>>>>>>     big log (~2.5MB).
>>>>>>>>>>>>
>>>>>>>>>>>>     Here it what I observed based on the log:
>>>>>>>>>>>>     I have 30 switches connected, all works fine. Then I add a new 
>>>>>>>>>>>> one:
>>>>>>>>>>>>     - SalRoleServiceImpl starts doing its thing (2016-02-08 
>>>>>>>>>>>> 23:13:38,534)
>>>>>>>>>>>>     - RpcManagerImpl Registering Openflow RPCs (2016-02-08 
>>>>>>>>>>>> 23:13:38,546)
>>>>>>>>>>>>     - ConnectionAdapterImpl Hello received (2016-02-08 
>>>>>>>>>>>> 23:13:40,520)
>>>>>>>>>>>>     - Creation of the transaction chain, ?
>>>>>>>>>>>>
>>>>>>>>>>>>     Then all starts failing apart with this log:
>>>>>>>>>>>>>     2016-02-08 23:13:50,021 | DEBUG | ntLoopGroup-11-9 | 
>>>>>>>>>>>>> ConnectionContextImpl            | 190 -
>>>>>>>>>>>>>     org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | 
>>>>>>>>>>>>> disconnecting:
>>>>>>>>>>>>>     node=/172.31.100.9:46736|auxId=0|connection state =
>>>>>>>>>>>>> RIP
>>>>>>>>>>>>     End then ConnectionContextImpl disconnects one by one the 
>>>>>>>>>>>> switches, RpcManagerImpl is unregistered
>>>>>>>>>>>>     Then it goes crazy for a while.
>>>>>>>>>>>>     But all I?ve done is adding a new switch..
>>>>>>>>>>>>
>>>>>>>>>>>>     Finally, at 2016-02-08 23:14:26,666, exceptions are thrown:
>>>>>>>>>>>>>     2016-02-08 23:14:26,666 | ERROR | lt-dispatcher-85 | 
>>>>>>>>>>>>> LocalThreePhaseCommitCohort      | 172 -
>>>>>>>>>>>>>     org.opendaylight.controller.sal-distributed-datastore - 
>>>>>>>>>>>>> 1.2.4.SNAPSHOT | Failed to prepare transaction
>>>>>>>>>>>>>     member-1-chn-5-txn-180 on backend
>>>>>>>>>>>>>     akka.pattern.AskTimeoutException: Ask timed out on
>>>>>>>>>>>>>     [ActorSelection[Anchor(akka://opendaylight-cluster-data/),
>>>>>>>>>>>>>
>>>>>>>>>>>>> Path(/user/shardmanager-operational/member-1-shard-invento
>>>>>>>>>>>>> ry-operational#-1518836725)]] after [30000 ms]
>>>>>>>>>>>>     And it goes for a while.
>>>>>>>>>>>>
>>>>>>>>>>>>     Do you have any input on the same?
>>>>>>>>>>>>
>>>>>>>>>>>>     Could you give some advice to be able to scale? (I know
>>>>>>>>>>>> disabling StatisticManager can help for instance)
>>>>>>>>>>>>
>>>>>>>>>>>>     Am I doing something wrong?
>>>>>>>>>>>>
>>>>>>>>>>>>     I can provide any asked information regarding the issue I?m 
>>>>>>>>>>>> facing.
>>>>>>>>>>>>
>>>>>>>>>>>>     Thanks,
>>>>>>>>>>>>     Alexis
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     _______________________________________________
>>>>>>>>>>     openflowplugin-dev mailing list
>>>>>>>>>>     [email protected]
>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>
>>>>>>>>>> https://lists.opendaylight.org/mailman/listinfo/openflowplugi
>>>>>>>>>> n-dev
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     _______________________________________________
>>>>>>>     openflowplugin-dev mailing list
>>>>>>>     [email protected]
>>>>>>>     <mailto:[email protected]> 
>>>>>>> <mailto:[email protected]>
>>>>>>>
>>>>>>> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-d
>>>>>>> ev
>>>
>>
>>
>>     _______________________________________________
>>     openflowplugin-dev mailing list
>>     [email protected] 
>> <mailto:[email protected]>
>>
>> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
>>
>>
>

_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Re: [openflowplugin-dev] Scalability issues

Reply via email to