Alexis, did you open a bug with all the information for this? we are releasing Be SR1 and I believe we still have serious perf issues with OVS 2.4.
BR/Luis > On Mar 4, 2016, at 4:56 PM, Jamo Luhrsen <[email protected]> wrote: > > Alexis, > > thanks for the bug and the patch, and keep up the good work digging at > openflowplugin. > > JamO > > On 03/04/2016 07:38 AM, Alexis de Talhouët wrote: >> JamO, >> >> Here is the bug: https://bugs.opendaylight.org/show_bug.cgi?id=5464 >> Here is the patch in int/test: https://git.opendaylight.org/gerrit/#/c/35813/ >> It is still WIP. And yes I believe we should have a CSIT job running the >> test. >> >> Thanks, >> Alexis >>> On Mar 3, 2016, at 12:41 AM, Jamo Luhrsen <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> >>> >>> On 02/19/2016 02:10 PM, Alexis de Talhouët wrote: >>>> So far my results are: >>>> >>>> OVS 2.4.0: ODL configure with 2G of mem —> max is ~50 switches connected >>>> OVS 2.3.1: ODL configure with 256MG of mem —> I currently have 150 >>>> switches connected, can’t scale more due to infra >>>> limits. >>> >>> Alexis, I think this is probably worth putting a bugzilla up. >>> >>> How much horsepower do you need per docker ovs instance? We need to get >>> this >>> automated in CSIT. Marcus from ovsdb wants to do similar tests with ovsdb. >>> >>> JamO >>> >>> >>>> I will pursue me testing next week. >>>> >>>> Thanks, >>>> Alexis >>>> >>>>> On Feb 19, 2016, at 5:06 PM, Abhijit Kumbhare <[email protected] >>>>> <mailto:[email protected]> >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> Interesting. I wonder - why that would be? >>>>> >>>>> On Fri, Feb 19, 2016 at 1:19 PM, Alexis de Talhouët >>>>> <[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> OVS 2.3.x scales fine >>>>> OVS 2.4.x doesn’t scale well. >>>>> >>>>> Here is also the docker file for ovs 2.4.1 >>>>> >>>>> >>>>> >>>>>> On Feb 19, 2016, at 11:20 AM, Alexis de Talhouët >>>>>> <[email protected] <mailto:[email protected]> >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>>> can I use your containers? do you have any scripts/tools to bring >>>>>>> things up/down? >>>>>> >>>>>> Sure, attached a tar file containing all scripts / config / dockerfile >>>>>> I’m using to setup docker containers >>>>>> emulating OvS. >>>>>> FYI: it’s ovs 2.3.0 and not 2.4.0 anymore >>>>>> >>>>>> Also, forget about this whole mail thread, something in my private >>>>>> container must be breaking OVS behaviour, I >>>>>> don’t know what yet. >>>>>> >>>>>> With the docker file attached here, I can scale 90+ without any >>>>>> trouble... >>>>>> >>>>>> Thanks, >>>>>> Alexis >>>>>> >>>>>> <ovs_scalability_setup.tar.gz> >>>>>> >>>>>>> On Feb 18, 2016, at 6:07 PM, Jamo Luhrsen <[email protected] >>>>>>> <mailto:[email protected]> >>>>>>> <mailto:[email protected]>> wrote: >>>>>>> >>>>>>> inline... >>>>>>> >>>>>>> On 02/18/2016 02:58 PM, Alexis de Talhouët wrote: >>>>>>>> I’m running OVS 2.4, against stable/lithium, openflowplugin-li >>>>>>> >>>>>>> >>>>>>> so this is one difference between CSIT and your setup, in addition to >>>>>>> the whole >>>>>>> containers vs mininet. >>>>>>> >>>>>>>> I never scaled up to 1k, this was in the CSIT job. >>>>>>>> In a real scenario, I scaled to ~400. But it was even before >>>>>>>> clustering came into play in ofp lithium. >>>>>>>> >>>>>>>> I think the log I sent have log trace for openflowplugin and >>>>>>>> openflowjava, it not the case I could resubmit the >>>>>>>> logs. >>>>>>>> I removed some of them in openflowjava because it was way to chatty >>>>>>>> (logging all messages content between ovs >>>>>>>> <---> odl) >>>>>>>> >>>>>>>> Unfortunately those IOException happen after the whole thing blow >>>>>>>> up. I was able to narrow done some logs in >>>>>>>> openflowjava >>>>>>>> to see the first disconnected event. As mentioned in a previous mail >>>>>>>> (in this mail thread) it’s the device that is >>>>>>>> issuing the disconnect: >>>>>>>> >>>>>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | OFFrameDecoder >>>>>>>>> | 201 - >>>>>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - >>>>>>>>> 0.6.4.SNAPSHOT | skipping bytebuf - too few bytes for >>>>>>>>> header: 0 < 8 >>>>>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | >>>>>>>>> OFVersionDetector | 201 - >>>>>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - >>>>>>>>> 0.6.4.SNAPSHOT | not enough data >>>>>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | >>>>>>>>> DelegatingInboundHandler | 201 - >>>>>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - >>>>>>>>> 0.6.4.SNAPSHOT | Channel inactive >>>>>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | >>>>>>>>> ConnectionAdapterImpl | 201 - >>>>>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - >>>>>>>>> 0.6.4.SNAPSHOT | ConsumeIntern msg on [id: 0x1efab5fb, >>>>>>>>> /172.18.0.49:36983 <http://172.18.0.49:36983/> :> >>>>>>>>> /192.168.1.159:6633 <http://192.168.1.159:6633/>] >>>>>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | >>>>>>>>> ConnectionAdapterImpl | 201 - >>>>>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - >>>>>>>>> 0.6.4.SNAPSHOT | ConsumeIntern msg - DisconnectEvent >>>>>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | >>>>>>>>> ConnectionContextImpl | 205 - >>>>>>>>> org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | >>>>>>>>> disconnecting: node=/172.18.0.49:36983|auxId=0|connection >>>>>>>>> state = RIP >>>>>>>> >>>>>>>> Those logs come from another run, so are not in the logs I sent >>>>>>>> earlier. Although the behaviour is always the >>>>>>>> same. >>>>>>>> >>>>>>>> Regarding the memory, I don’t want to add more than 2G memory, >>>>>>>> because, and I tested it, the more memory I add, >>>>>>>> the more >>>>>>>> I can scale. But as you pointed out, >>>>>>>> this issue is not OOM error. Thus I rather like failing at 2G (less >>>>>>>> docker containers to spawn each run ~50). >>>>>>> >>>>>>> so, maybe reduce your memory then to simplify the reproducing steps. >>>>>>> Since you know that increasing >>>>>>> memory allows you to scale further, but still hit the problem; let's >>>>>>> make it easier to hit. how far >>>>>>> can you go with the max mem set to 500M? if you are only loading >>>>>>> ofp-li. >>>>>>> >>>>>>>> I definitely need some help here, because I can’t sort myself out in >>>>>>>> the openflowplugin + openflowjava codebase… >>>>>>>> But I believe I already have Michal’s attention :) >>>>>>> >>>>>>> can I use your containers? do you have any scripts/tools to bring >>>>>>> things up/down? >>>>>>> I might be able to try and reproduce myself. I like breaking things >>>>>>> :) >>>>>>> >>>>>>> JamO >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Alexis >>>>>>>> >>>>>>>> >>>>>>>>> On Feb 18, 2016, at 5:44 PM, Jamo Luhrsen <[email protected] >>>>>>>>> <mailto:[email protected]> >>>>>>>>> <mailto:[email protected]> <mailto:[email protected]>> wrote: >>>>>>>>> >>>>>>>>> Alexis, don't worry about filing a bug just to give us a common >>>>>>>>> place to work/comment, even >>>>>>>>> if we close it later because of something outside of ODL. Email is >>>>>>>>> fine too. >>>>>>>>> >>>>>>>>> what ovs version do you have in your containers? this test sounds >>>>>>>>> great. >>>>>>>>> >>>>>>>>> Luis is right, that if you were scaling well past 1k in the past, >>>>>>>>> but now it falls over at >>>>>>>>> 50 it sounds like a bug. >>>>>>>>> >>>>>>>>> Oh, you can try increasing the jvm max_mem from default of 2G just >>>>>>>>> as a data point. The >>>>>>>>> fact that you don't get OOMs makes me think memory might not be the >>>>>>>>> final bottleneck. >>>>>>>>> >>>>>>>>> you could enable debug/trace logs in the right modules (need ofp >>>>>>>>> devs to tell us that) >>>>>>>>> for a little more info. >>>>>>>>> >>>>>>>>> I've seen those IOExceptions before and always assumed it was from >>>>>>>>> an OF switch doing a >>>>>>>>> hard RST on it's connection. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> JamO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 02/18/2016 11:48 AM, Luis Gomez wrote: >>>>>>>>>> If the same test worked 6-8 months ago this seems like a bug, but >>>>>>>>>> please feel free to open it whenever you >>>>>>>>>> are sure. >>>>>>>>>> >>>>>>>>>>> On Feb 18, 2016, at 11:45 AM, Alexis de Talhouët >>>>>>>>>>> <[email protected] <mailto:[email protected]> >>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hello Luis, >>>>>>>>>>> >>>>>>>>>>> For sure I’m willing to open a bug but before I want to make sure >>>>>>>>>>> there is a bug and that I’m not doing >>>>>>>>>>> something wrong. >>>>>>>>>>> In ODL’s infra, there is a test to find the maximum number of >>>>>>>>>>> switches that can be connected to ODL, and >>>>>>>>>>> this test >>>>>>>>>>> reach ~ 500 [0] >>>>>>>>>>> I was able to scale up to 1090 switches [1] using the CSIT job in >>>>>>>>>>> the sandbox. >>>>>>>>>>> I believe the CSIT test is different in a way that switches are >>>>>>>>>>> emulated in one mininet VM, whereas I’m >>>>>>>>>>> connecting OVS >>>>>>>>>>> instances from separate containers. >>>>>>>>>>> >>>>>>>>>>> 6-8 months ago, I was able to perform the same test, and scale >>>>>>>>>>> with OVS docker container up to ~400 before >>>>>>>>>>> ODL start >>>>>>>>>>> crashing (with some optimization done behind the scene, i.e. >>>>>>>>>>> ulimit, mem, cpu, GC…) >>>>>>>>>>> Now I’m not able to scale more than 100 with the same >>>>>>>>>>> configuration. >>>>>>>>>>> >>>>>>>>>>> FYI: I just quickly look at the CSIT test [0] karaf.log, it seems >>>>>>>>>>> the test is actually failing but it is not >>>>>>>>>>> correctly >>>>>>>>>>> advertised… switch connection are dropped. >>>>>>>>>>> Look for those: >>>>>>>>>>> 016-02-18 07:07:51,741 | WARN | entLoopGroup-6-6 | >>>>>>>>>>> OFFrameDecoder | 181 - >>>>>>>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - >>>>>>>>>>> 0.6.4.SNAPSHOT | Unexpected exception from downstream. >>>>>>>>>>> java.io.IOException: Connection reset by peer >>>>>>>>>>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)[:1.7.0_85] >>>>>>>>>>> at >>>>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)[:1.7.0_85] >>>>>>>>>>> at >>>>>>>>>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)[:1.7.0_85] >>>>>>>>>>> at sun.nio.ch.IOUtil.read(IOUtil.java:192)[:1.7.0_85] >>>>>>>>>>> at >>>>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)[:1.7.0_85] >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)[111:io.netty.buffer:4.0.26.Final] >>>>>>>>>>> at >>>>>>>>>>> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)[111:io.netty.buffer:4.0.26.Final] >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)[109:io.netty.transport:4.0.26.Final] >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)[109:io.netty.transport:4.0.26.Final] >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)[109:io.netty.transport:4.0.26.Final] >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)[109:io.netty.transport:4.0.26.Final] >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)[109:io.netty.transport:4.0.26.Final] >>>>>>>>>>> at >>>>>>>>>>> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:349)[109:io.netty.transport:4.0.26.Final] >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)[110:io.netty.common:4.0.26.Final] >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[110:io.netty.common:4.0.26.Final] >>>>>>>>>>> at java.lang.Thread.run(Thread.java:745)[:1.7.0_85] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [0]: >>>>>>>>>>> https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-scalability-daily-only-stable-lithium/ >>>>>>>>>>> [1]: https://git.opendaylight.org/gerrit/#/c/33213/ >>>>>>>>>>> >>>>>>>>>>>> On Feb 18, 2016, at 2:28 PM, Luis Gomez <[email protected] >>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>> <mailto:[email protected]> <mailto:[email protected]>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Alexis, thanks very much for sharing this test. Would you mind >>>>>>>>>>>> to open a bug with all this info so we can >>>>>>>>>>>> track this? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Feb 18, 2016, at 7:29 AM, Alexis de Talhouët >>>>>>>>>>>>> <[email protected] <mailto:[email protected]> >>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Michal, >>>>>>>>>>>>> >>>>>>>>>>>>> ODL memory is capped at 2go, the more memory I add, those more >>>>>>>>>>>>> OVS I can connect. Regarding CPU, it’s >>>>>>>>>>>>> around 10-20% >>>>>>>>>>>>> when connecting new OVS, with some peak to 80%. >>>>>>>>>>>>> >>>>>>>>>>>>> After some investigation, here is what I observed: >>>>>>>>>>>>> Let say I have 50 switches connected, stat manager disabled. I >>>>>>>>>>>>> have one opened socket per switch, plus an >>>>>>>>>>>>> additional >>>>>>>>>>>>> one for the controller. >>>>>>>>>>>>> Then I connect a new switch (2016-02-18 09:35:08,059), 51 >>>>>>>>>>>>> switches… something is happening causing all >>>>>>>>>>>>> connection to >>>>>>>>>>>>> be dropped (by device?) and then ODL >>>>>>>>>>>>> try to recreate them and goes in a crazy loop where it is never >>>>>>>>>>>>> able to re-establish communication, but keeps >>>>>>>>>>>>> creating new sockets. >>>>>>>>>>>>> I’m suspecting something being garbage collected due to lack of >>>>>>>>>>>>> memory, although no OOM errors. >>>>>>>>>>>>> >>>>>>>>>>>>> Attached the YourKit Java Profiler analysis for the described >>>>>>>>>>>>> scenario and the logs [1]. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Alexis >>>>>>>>>>>>> >>>>>>>>>>>>> [1]: >>>>>>>>>>>>> https://www.dropbox.com/sh/dgqeqv4j76zwbh3/AACim0za1fUozc7DlYJ4fsMJa?dl=0 >>>>>>>>>>>>> >>>>>>>>>>>>>> On Feb 9, 2016, at 8:59 AM, Michal Rehak -X (mirehak - >>>>>>>>>>>>>> PANTHEON TECHNOLOGIES at Cisco) >>>>>>>>>>>>>> <[email protected] <mailto:[email protected]> >>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Alexis, >>>>>>>>>>>>>> I am not sure how OVS uses threads - in changelog there is >>>>>>>>>>>>>> some concurrency related improvement in 2.1.3 >>>>>>>>>>>>>> and 2.3. >>>>>>>>>>>>>> Also I guess docker can be forced regarding assigned resources. >>>>>>>>>>>>>> >>>>>>>>>>>>>> For you the most important is the amount of cores used by >>>>>>>>>>>>>> controller. >>>>>>>>>>>>>> >>>>>>>>>>>>>> How does your cpu and memory consumption look like when you >>>>>>>>>>>>>> connect all the OVSs? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Michal >>>>>>>>>>>>>> >>>>>>>>>>>>>> ________________________________________ >>>>>>>>>>>>>> From: Alexis de Talhouët <[email protected] >>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>> <mailto:[email protected]>> >>>>>>>>>>>>>> Sent: Tuesday, February 9, 2016 14:44 >>>>>>>>>>>>>> To: Michal Rehak -X (mirehak - PANTHEON TECHNOLOGIES at Cisco) >>>>>>>>>>>>>> Cc: [email protected] >>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>> Subject: Re: [openflowplugin-dev] Scalability issues >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hello Michal, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, all the OvS instances I’m running has a unique DPID. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regarding the thread limit for netty, I’m running test in a >>>>>>>>>>>>>> server that has 28 CPU(s). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Does each OvS instances is assigned its own thread? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Alexis >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Feb 9, 2016, at 3:42 AM, Michal Rehak -X (mirehak - >>>>>>>>>>>>>>> PANTHEON TECHNOLOGIES at Cisco) >>>>>>>>>>>>>>> <[email protected] <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Alexis, >>>>>>>>>>>>>>> in Li-design there is the stats manager not in form of >>>>>>>>>>>>>>> standalone app but as part of core of ofPlugin. >>>>>>>>>>>>>>> You can >>>>>>>>>>>>>>> disable it via rpc. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Just a question regarding your ovs setup. Do you have all >>>>>>>>>>>>>>> DPIDs unique? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also there is limit for netty in form of amount of used >>>>>>>>>>>>>>> threads. By default it uses 2 x >>>>>>>>>>>>>>> cpu_cores_amount. You >>>>>>>>>>>>>>> should have as many cores as possible in order to get max >>>>>>>>>>>>>>> performance. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Michal >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ________________________________________ >>>>>>>>>>>>>>> From: [email protected] >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> <[email protected] >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]>> >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>> behalf of Alexis de Talhouët <[email protected] >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]>> >>>>>>>>>>>>>>> Sent: Tuesday, February 9, 2016 00:45 >>>>>>>>>>>>>>> To: [email protected] >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> Subject: [openflowplugin-dev] Scalability issues >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello openflowplugin-dev, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I’m currently running some scalability test against >>>>>>>>>>>>>>> openflowplugin-li plugin, stable/lithium. >>>>>>>>>>>>>>> Playing with CSIT job, I was able to connect up to 1090 >>>>>>>>>>>>>>> switches: https://git.opendaylight.org/gerrit/#/c/33213/ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I’m now running the test against 40 OvS switches, each one of >>>>>>>>>>>>>>> them is in a docker container. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Connecting around 30 of them works fine, but then, adding a >>>>>>>>>>>>>>> new one break completely ODL, it goes crazy and >>>>>>>>>>>>>>> unresponsible. >>>>>>>>>>>>>>> Attach a snippet of the karaf.log with log set to DEBUG for >>>>>>>>>>>>>>> org.opendaylight.openflowplugin, thus it’s a >>>>>>>>>>>>>>> really >>>>>>>>>>>>>>> big log (~2.5MB). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Here it what I observed based on the log: >>>>>>>>>>>>>>> I have 30 switches connected, all works fine. Then I add a >>>>>>>>>>>>>>> new one: >>>>>>>>>>>>>>> - SalRoleServiceImpl starts doing its thing (2016-02-08 >>>>>>>>>>>>>>> 23:13:38,534) >>>>>>>>>>>>>>> - RpcManagerImpl Registering Openflow RPCs (2016-02-08 >>>>>>>>>>>>>>> 23:13:38,546) >>>>>>>>>>>>>>> - ConnectionAdapterImpl Hello received (2016-02-08 >>>>>>>>>>>>>>> 23:13:40,520) >>>>>>>>>>>>>>> - Creation of the transaction chain, … >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Then all starts failing apart with this log: >>>>>>>>>>>>>>>> 2016-02-08 23:13:50,021 | DEBUG | ntLoopGroup-11-9 | >>>>>>>>>>>>>>>> ConnectionContextImpl | 190 - >>>>>>>>>>>>>>>> org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | >>>>>>>>>>>>>>>> disconnecting: >>>>>>>>>>>>>>>> node=/172.31.100.9:46736|auxId=0|connection state = RIP >>>>>>>>>>>>>>> End then ConnectionContextImpl disconnects one by one the >>>>>>>>>>>>>>> switches, RpcManagerImpl is unregistered >>>>>>>>>>>>>>> Then it goes crazy for a while. >>>>>>>>>>>>>>> But all I’ve done is adding a new switch.. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Finally, at 2016-02-08 23:14:26,666, exceptions are thrown: >>>>>>>>>>>>>>>> 2016-02-08 23:14:26,666 | ERROR | lt-dispatcher-85 | >>>>>>>>>>>>>>>> LocalThreePhaseCommitCohort | 172 - >>>>>>>>>>>>>>>> org.opendaylight.controller.sal-distributed-datastore - >>>>>>>>>>>>>>>> 1.2.4.SNAPSHOT | Failed to prepare transaction >>>>>>>>>>>>>>>> member-1-chn-5-txn-180 on backend >>>>>>>>>>>>>>>> akka.pattern.AskTimeoutException: Ask timed out on >>>>>>>>>>>>>>>> [ActorSelection[Anchor(akka://opendaylight-cluster-data/), >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Path(/user/shardmanager-operational/member-1-shard-inventory-operational#-1518836725)]] >>>>>>>>>>>>>>>> after [30000 ms] >>>>>>>>>>>>>>> And it goes for a while. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Do you have any input on the same? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Could you give some advice to be able to scale? (I know >>>>>>>>>>>>>>> disabling StatisticManager can help for instance) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Am I doing something wrong? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I can provide any asked information regarding the issue I’m >>>>>>>>>>>>>>> facing. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Alexis >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> openflowplugin-dev mailing list >>>>>>>>>>>>> [email protected] >>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>> >>>>>>>>>>>>> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> openflowplugin-dev mailing list >>>>>>>>>> [email protected] >>>>>>>>>> <mailto:[email protected]> >>>>>>>>>> <mailto:[email protected]> >>>>>>>>>> <mailto:[email protected]> >>>>>>>>>> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> openflowplugin-dev mailing list >>>>> [email protected] >>>>> <mailto:[email protected]> >>>>> <mailto:[email protected]> >>>>> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev >>>>> >>>>> >>>> >> > _______________________________________________ > openflowplugin-dev mailing list > [email protected] > https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev _______________________________________________ openflowplugin-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
