I did, and started something in int/test but haven’t got the time to finish it.
https://bugs.opendaylight.org/show_bug.cgi?id=5464 <https://bugs.opendaylight.org/show_bug.cgi?id=5464> https://git.opendaylight.org/gerrit/#/c/35813/ <https://git.opendaylight.org/gerrit/#/c/35813/> I agree with the serious problem with ovs2.4, but right now i’m trying to solve a FD leak in netconf :) Thanks, Alexis > On Mar 15, 2016, at 2:27 PM, Luis Gomez <[email protected]> wrote: > > Alexis, did you open a bug with all the information for this? we are > releasing Be SR1 and I believe we still have serious perf issues with OVS 2.4. > > BR/Luis > > > >> On Mar 4, 2016, at 4:56 PM, Jamo Luhrsen <[email protected]> wrote: >> >> Alexis, >> >> thanks for the bug and the patch, and keep up the good work digging at >> openflowplugin. >> >> JamO >> >> On 03/04/2016 07:38 AM, Alexis de Talhouët wrote: >>> JamO, >>> >>> Here is the bug: https://bugs.opendaylight.org/show_bug.cgi?id=5464 >>> Here is the patch in int/test: >>> https://git.opendaylight.org/gerrit/#/c/35813/ >>> It is still WIP. And yes I believe we should have a CSIT job running the >>> test. >>> >>> Thanks, >>> Alexis >>>> On Mar 3, 2016, at 12:41 AM, Jamo Luhrsen <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> >>>> >>>> On 02/19/2016 02:10 PM, Alexis de Talhouët wrote: >>>>> So far my results are: >>>>> >>>>> OVS 2.4.0: ODL configure with 2G of mem —> max is ~50 switches connected >>>>> OVS 2.3.1: ODL configure with 256MG of mem —> I currently have 150 >>>>> switches connected, can’t scale more due to infra >>>>> limits. >>>> >>>> Alexis, I think this is probably worth putting a bugzilla up. >>>> >>>> How much horsepower do you need per docker ovs instance? We need to get >>>> this >>>> automated in CSIT. Marcus from ovsdb wants to do similar tests with ovsdb. >>>> >>>> JamO >>>> >>>> >>>>> I will pursue me testing next week. >>>>> >>>>> Thanks, >>>>> Alexis >>>>> >>>>>> On Feb 19, 2016, at 5:06 PM, Abhijit Kumbhare <[email protected] >>>>>> <mailto:[email protected]> >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> Interesting. I wonder - why that would be? >>>>>> >>>>>> On Fri, Feb 19, 2016 at 1:19 PM, Alexis de Talhouët >>>>>> <[email protected] <mailto:[email protected]> >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> OVS 2.3.x scales fine >>>>>> OVS 2.4.x doesn’t scale well. >>>>>> >>>>>> Here is also the docker file for ovs 2.4.1 >>>>>> >>>>>> >>>>>> >>>>>>> On Feb 19, 2016, at 11:20 AM, Alexis de Talhouët >>>>>>> <[email protected] <mailto:[email protected]> >>>>>>> <mailto:[email protected]>> wrote: >>>>>>> >>>>>>>> can I use your containers? do you have any scripts/tools to bring >>>>>>>> things up/down? >>>>>>> >>>>>>> Sure, attached a tar file containing all scripts / config / dockerfile >>>>>>> I’m using to setup docker containers >>>>>>> emulating OvS. >>>>>>> FYI: it’s ovs 2.3.0 and not 2.4.0 anymore >>>>>>> >>>>>>> Also, forget about this whole mail thread, something in my private >>>>>>> container must be breaking OVS behaviour, I >>>>>>> don’t know what yet. >>>>>>> >>>>>>> With the docker file attached here, I can scale 90+ without any >>>>>>> trouble... >>>>>>> >>>>>>> Thanks, >>>>>>> Alexis >>>>>>> >>>>>>> <ovs_scalability_setup.tar.gz> >>>>>>> >>>>>>>> On Feb 18, 2016, at 6:07 PM, Jamo Luhrsen <[email protected] >>>>>>>> <mailto:[email protected]> >>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>> >>>>>>>> inline... >>>>>>>> >>>>>>>> On 02/18/2016 02:58 PM, Alexis de Talhouët wrote: >>>>>>>>> I’m running OVS 2.4, against stable/lithium, openflowplugin-li >>>>>>>> >>>>>>>> >>>>>>>> so this is one difference between CSIT and your setup, in addition to >>>>>>>> the whole >>>>>>>> containers vs mininet. >>>>>>>> >>>>>>>>> I never scaled up to 1k, this was in the CSIT job. >>>>>>>>> In a real scenario, I scaled to ~400. But it was even before >>>>>>>>> clustering came into play in ofp lithium. >>>>>>>>> >>>>>>>>> I think the log I sent have log trace for openflowplugin and >>>>>>>>> openflowjava, it not the case I could resubmit the >>>>>>>>> logs. >>>>>>>>> I removed some of them in openflowjava because it was way to chatty >>>>>>>>> (logging all messages content between ovs >>>>>>>>> <---> odl) >>>>>>>>> >>>>>>>>> Unfortunately those IOException happen after the whole thing blow >>>>>>>>> up. I was able to narrow done some logs in >>>>>>>>> openflowjava >>>>>>>>> to see the first disconnected event. As mentioned in a previous mail >>>>>>>>> (in this mail thread) it’s the device that is >>>>>>>>> issuing the disconnect: >>>>>>>>> >>>>>>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | OFFrameDecoder >>>>>>>>>> | 201 - >>>>>>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - >>>>>>>>>> 0.6.4.SNAPSHOT | skipping bytebuf - too few bytes for >>>>>>>>>> header: 0 < 8 >>>>>>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | >>>>>>>>>> OFVersionDetector | 201 - >>>>>>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - >>>>>>>>>> 0.6.4.SNAPSHOT | not enough data >>>>>>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | >>>>>>>>>> DelegatingInboundHandler | 201 - >>>>>>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - >>>>>>>>>> 0.6.4.SNAPSHOT | Channel inactive >>>>>>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | >>>>>>>>>> ConnectionAdapterImpl | 201 - >>>>>>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - >>>>>>>>>> 0.6.4.SNAPSHOT | ConsumeIntern msg on [id: 0x1efab5fb, >>>>>>>>>> /172.18.0.49:36983 <http://172.18.0.49:36983/> :> >>>>>>>>>> /192.168.1.159:6633 <http://192.168.1.159:6633/>] >>>>>>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | >>>>>>>>>> ConnectionAdapterImpl | 201 - >>>>>>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - >>>>>>>>>> 0.6.4.SNAPSHOT | ConsumeIntern msg - DisconnectEvent >>>>>>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | >>>>>>>>>> ConnectionContextImpl | 205 - >>>>>>>>>> org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | >>>>>>>>>> disconnecting: node=/172.18.0.49:36983|auxId=0|connection >>>>>>>>>> state = RIP >>>>>>>>> >>>>>>>>> Those logs come from another run, so are not in the logs I sent >>>>>>>>> earlier. Although the behaviour is always the >>>>>>>>> same. >>>>>>>>> >>>>>>>>> Regarding the memory, I don’t want to add more than 2G memory, >>>>>>>>> because, and I tested it, the more memory I add, >>>>>>>>> the more >>>>>>>>> I can scale. But as you pointed out, >>>>>>>>> this issue is not OOM error. Thus I rather like failing at 2G (less >>>>>>>>> docker containers to spawn each run ~50). >>>>>>>> >>>>>>>> so, maybe reduce your memory then to simplify the reproducing steps. >>>>>>>> Since you know that increasing >>>>>>>> memory allows you to scale further, but still hit the problem; let's >>>>>>>> make it easier to hit. how far >>>>>>>> can you go with the max mem set to 500M? if you are only loading >>>>>>>> ofp-li. >>>>>>>> >>>>>>>>> I definitely need some help here, because I can’t sort myself out in >>>>>>>>> the openflowplugin + openflowjava codebase… >>>>>>>>> But I believe I already have Michal’s attention :) >>>>>>>> >>>>>>>> can I use your containers? do you have any scripts/tools to bring >>>>>>>> things up/down? >>>>>>>> I might be able to try and reproduce myself. I like breaking things >>>>>>>> :) >>>>>>>> >>>>>>>> JamO >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Alexis >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Feb 18, 2016, at 5:44 PM, Jamo Luhrsen <[email protected] >>>>>>>>>> <mailto:[email protected]> >>>>>>>>>> <mailto:[email protected]> <mailto:[email protected]>> wrote: >>>>>>>>>> >>>>>>>>>> Alexis, don't worry about filing a bug just to give us a common >>>>>>>>>> place to work/comment, even >>>>>>>>>> if we close it later because of something outside of ODL. Email is >>>>>>>>>> fine too. >>>>>>>>>> >>>>>>>>>> what ovs version do you have in your containers? this test sounds >>>>>>>>>> great. >>>>>>>>>> >>>>>>>>>> Luis is right, that if you were scaling well past 1k in the past, >>>>>>>>>> but now it falls over at >>>>>>>>>> 50 it sounds like a bug. >>>>>>>>>> >>>>>>>>>> Oh, you can try increasing the jvm max_mem from default of 2G just >>>>>>>>>> as a data point. The >>>>>>>>>> fact that you don't get OOMs makes me think memory might not be the >>>>>>>>>> final bottleneck. >>>>>>>>>> >>>>>>>>>> you could enable debug/trace logs in the right modules (need ofp >>>>>>>>>> devs to tell us that) >>>>>>>>>> for a little more info. >>>>>>>>>> >>>>>>>>>> I've seen those IOExceptions before and always assumed it was from >>>>>>>>>> an OF switch doing a >>>>>>>>>> hard RST on it's connection. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> JamO >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 02/18/2016 11:48 AM, Luis Gomez wrote: >>>>>>>>>>> If the same test worked 6-8 months ago this seems like a bug, but >>>>>>>>>>> please feel free to open it whenever you >>>>>>>>>>> are sure. >>>>>>>>>>> >>>>>>>>>>>> On Feb 18, 2016, at 11:45 AM, Alexis de Talhouët >>>>>>>>>>>> <[email protected] <mailto:[email protected]> >>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hello Luis, >>>>>>>>>>>> >>>>>>>>>>>> For sure I’m willing to open a bug but before I want to make sure >>>>>>>>>>>> there is a bug and that I’m not doing >>>>>>>>>>>> something wrong. >>>>>>>>>>>> In ODL’s infra, there is a test to find the maximum number of >>>>>>>>>>>> switches that can be connected to ODL, and >>>>>>>>>>>> this test >>>>>>>>>>>> reach ~ 500 [0] >>>>>>>>>>>> I was able to scale up to 1090 switches [1] using the CSIT job in >>>>>>>>>>>> the sandbox. >>>>>>>>>>>> I believe the CSIT test is different in a way that switches are >>>>>>>>>>>> emulated in one mininet VM, whereas I’m >>>>>>>>>>>> connecting OVS >>>>>>>>>>>> instances from separate containers. >>>>>>>>>>>> >>>>>>>>>>>> 6-8 months ago, I was able to perform the same test, and scale >>>>>>>>>>>> with OVS docker container up to ~400 before >>>>>>>>>>>> ODL start >>>>>>>>>>>> crashing (with some optimization done behind the scene, i.e. >>>>>>>>>>>> ulimit, mem, cpu, GC…) >>>>>>>>>>>> Now I’m not able to scale more than 100 with the same >>>>>>>>>>>> configuration. >>>>>>>>>>>> >>>>>>>>>>>> FYI: I just quickly look at the CSIT test [0] karaf.log, it seems >>>>>>>>>>>> the test is actually failing but it is not >>>>>>>>>>>> correctly >>>>>>>>>>>> advertised… switch connection are dropped. >>>>>>>>>>>> Look for those: >>>>>>>>>>>> 016-02-18 07:07:51,741 | WARN | entLoopGroup-6-6 | >>>>>>>>>>>> OFFrameDecoder | 181 - >>>>>>>>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - >>>>>>>>>>>> 0.6.4.SNAPSHOT | Unexpected exception from downstream. >>>>>>>>>>>> java.io.IOException: Connection reset by peer >>>>>>>>>>>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)[:1.7.0_85] >>>>>>>>>>>> at >>>>>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)[:1.7.0_85] >>>>>>>>>>>> at >>>>>>>>>>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)[:1.7.0_85] >>>>>>>>>>>> at sun.nio.ch.IOUtil.read(IOUtil.java:192)[:1.7.0_85] >>>>>>>>>>>> at >>>>>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)[:1.7.0_85] >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>>> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)[111:io.netty.buffer:4.0.26.Final] >>>>>>>>>>>> at >>>>>>>>>>>> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)[111:io.netty.buffer:4.0.26.Final] >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>>> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)[109:io.netty.transport:4.0.26.Final] >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)[109:io.netty.transport:4.0.26.Final] >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)[109:io.netty.transport:4.0.26.Final] >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)[109:io.netty.transport:4.0.26.Final] >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)[109:io.netty.transport:4.0.26.Final] >>>>>>>>>>>> at >>>>>>>>>>>> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:349)[109:io.netty.transport:4.0.26.Final] >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)[110:io.netty.common:4.0.26.Final] >>>>>>>>>>>> at >>>>>>>>>>>> >>>>>>>>>>>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[110:io.netty.common:4.0.26.Final] >>>>>>>>>>>> at java.lang.Thread.run(Thread.java:745)[:1.7.0_85] >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [0]: >>>>>>>>>>>> https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-scalability-daily-only-stable-lithium/ >>>>>>>>>>>> [1]: https://git.opendaylight.org/gerrit/#/c/33213/ >>>>>>>>>>>> >>>>>>>>>>>>> On Feb 18, 2016, at 2:28 PM, Luis Gomez <[email protected] >>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>> <mailto:[email protected]> <mailto:[email protected]>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Alexis, thanks very much for sharing this test. Would you mind >>>>>>>>>>>>> to open a bug with all this info so we can >>>>>>>>>>>>> track this? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Feb 18, 2016, at 7:29 AM, Alexis de Talhouët >>>>>>>>>>>>>> <[email protected] <mailto:[email protected]> >>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Michal, >>>>>>>>>>>>>> >>>>>>>>>>>>>> ODL memory is capped at 2go, the more memory I add, those more >>>>>>>>>>>>>> OVS I can connect. Regarding CPU, it’s >>>>>>>>>>>>>> around 10-20% >>>>>>>>>>>>>> when connecting new OVS, with some peak to 80%. >>>>>>>>>>>>>> >>>>>>>>>>>>>> After some investigation, here is what I observed: >>>>>>>>>>>>>> Let say I have 50 switches connected, stat manager disabled. I >>>>>>>>>>>>>> have one opened socket per switch, plus an >>>>>>>>>>>>>> additional >>>>>>>>>>>>>> one for the controller. >>>>>>>>>>>>>> Then I connect a new switch (2016-02-18 09:35:08,059), 51 >>>>>>>>>>>>>> switches… something is happening causing all >>>>>>>>>>>>>> connection to >>>>>>>>>>>>>> be dropped (by device?) and then ODL >>>>>>>>>>>>>> try to recreate them and goes in a crazy loop where it is never >>>>>>>>>>>>>> able to re-establish communication, but keeps >>>>>>>>>>>>>> creating new sockets. >>>>>>>>>>>>>> I’m suspecting something being garbage collected due to lack of >>>>>>>>>>>>>> memory, although no OOM errors. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Attached the YourKit Java Profiler analysis for the described >>>>>>>>>>>>>> scenario and the logs [1]. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Alexis >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1]: >>>>>>>>>>>>>> https://www.dropbox.com/sh/dgqeqv4j76zwbh3/AACim0za1fUozc7DlYJ4fsMJa?dl=0 >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Feb 9, 2016, at 8:59 AM, Michal Rehak -X (mirehak - >>>>>>>>>>>>>>> PANTHEON TECHNOLOGIES at Cisco) >>>>>>>>>>>>>>> <[email protected] <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Alexis, >>>>>>>>>>>>>>> I am not sure how OVS uses threads - in changelog there is >>>>>>>>>>>>>>> some concurrency related improvement in 2.1.3 >>>>>>>>>>>>>>> and 2.3. >>>>>>>>>>>>>>> Also I guess docker can be forced regarding assigned resources. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For you the most important is the amount of cores used by >>>>>>>>>>>>>>> controller. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> How does your cpu and memory consumption look like when you >>>>>>>>>>>>>>> connect all the OVSs? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Michal >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ________________________________________ >>>>>>>>>>>>>>> From: Alexis de Talhouët <[email protected] >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]>> >>>>>>>>>>>>>>> Sent: Tuesday, February 9, 2016 14:44 >>>>>>>>>>>>>>> To: Michal Rehak -X (mirehak - PANTHEON TECHNOLOGIES at Cisco) >>>>>>>>>>>>>>> Cc: [email protected] >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> Subject: Re: [openflowplugin-dev] Scalability issues >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello Michal, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yes, all the OvS instances I’m running has a unique DPID. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regarding the thread limit for netty, I’m running test in a >>>>>>>>>>>>>>> server that has 28 CPU(s). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Does each OvS instances is assigned its own thread? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Alexis >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Feb 9, 2016, at 3:42 AM, Michal Rehak -X (mirehak - >>>>>>>>>>>>>>>> PANTHEON TECHNOLOGIES at Cisco) >>>>>>>>>>>>>>>> <[email protected] <mailto:[email protected]> >>>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Alexis, >>>>>>>>>>>>>>>> in Li-design there is the stats manager not in form of >>>>>>>>>>>>>>>> standalone app but as part of core of ofPlugin. >>>>>>>>>>>>>>>> You can >>>>>>>>>>>>>>>> disable it via rpc. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Just a question regarding your ovs setup. Do you have all >>>>>>>>>>>>>>>> DPIDs unique? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also there is limit for netty in form of amount of used >>>>>>>>>>>>>>>> threads. By default it uses 2 x >>>>>>>>>>>>>>>> cpu_cores_amount. You >>>>>>>>>>>>>>>> should have as many cores as possible in order to get max >>>>>>>>>>>>>>>> performance. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Michal >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ________________________________________ >>>>>>>>>>>>>>>> From: [email protected] >>>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>>> <[email protected] >>>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>>> <mailto:[email protected]>> >>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>> behalf of Alexis de Talhouët <[email protected] >>>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>>> <mailto:[email protected]>> >>>>>>>>>>>>>>>> Sent: Tuesday, February 9, 2016 00:45 >>>>>>>>>>>>>>>> To: [email protected] >>>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>>> Subject: [openflowplugin-dev] Scalability issues >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hello openflowplugin-dev, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I’m currently running some scalability test against >>>>>>>>>>>>>>>> openflowplugin-li plugin, stable/lithium. >>>>>>>>>>>>>>>> Playing with CSIT job, I was able to connect up to 1090 >>>>>>>>>>>>>>>> switches: https://git.opendaylight.org/gerrit/#/c/33213/ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I’m now running the test against 40 OvS switches, each one of >>>>>>>>>>>>>>>> them is in a docker container. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Connecting around 30 of them works fine, but then, adding a >>>>>>>>>>>>>>>> new one break completely ODL, it goes crazy and >>>>>>>>>>>>>>>> unresponsible. >>>>>>>>>>>>>>>> Attach a snippet of the karaf.log with log set to DEBUG for >>>>>>>>>>>>>>>> org.opendaylight.openflowplugin, thus it’s a >>>>>>>>>>>>>>>> really >>>>>>>>>>>>>>>> big log (~2.5MB). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Here it what I observed based on the log: >>>>>>>>>>>>>>>> I have 30 switches connected, all works fine. Then I add a >>>>>>>>>>>>>>>> new one: >>>>>>>>>>>>>>>> - SalRoleServiceImpl starts doing its thing (2016-02-08 >>>>>>>>>>>>>>>> 23:13:38,534) >>>>>>>>>>>>>>>> - RpcManagerImpl Registering Openflow RPCs (2016-02-08 >>>>>>>>>>>>>>>> 23:13:38,546) >>>>>>>>>>>>>>>> - ConnectionAdapterImpl Hello received (2016-02-08 >>>>>>>>>>>>>>>> 23:13:40,520) >>>>>>>>>>>>>>>> - Creation of the transaction chain, … >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Then all starts failing apart with this log: >>>>>>>>>>>>>>>>> 2016-02-08 23:13:50,021 | DEBUG | ntLoopGroup-11-9 | >>>>>>>>>>>>>>>>> ConnectionContextImpl | 190 - >>>>>>>>>>>>>>>>> org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | >>>>>>>>>>>>>>>>> disconnecting: >>>>>>>>>>>>>>>>> node=/172.31.100.9:46736|auxId=0|connection state = RIP >>>>>>>>>>>>>>>> End then ConnectionContextImpl disconnects one by one the >>>>>>>>>>>>>>>> switches, RpcManagerImpl is unregistered >>>>>>>>>>>>>>>> Then it goes crazy for a while. >>>>>>>>>>>>>>>> But all I’ve done is adding a new switch.. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Finally, at 2016-02-08 23:14:26,666, exceptions are thrown: >>>>>>>>>>>>>>>>> 2016-02-08 23:14:26,666 | ERROR | lt-dispatcher-85 | >>>>>>>>>>>>>>>>> LocalThreePhaseCommitCohort | 172 - >>>>>>>>>>>>>>>>> org.opendaylight.controller.sal-distributed-datastore - >>>>>>>>>>>>>>>>> 1.2.4.SNAPSHOT | Failed to prepare transaction >>>>>>>>>>>>>>>>> member-1-chn-5-txn-180 on backend >>>>>>>>>>>>>>>>> akka.pattern.AskTimeoutException: Ask timed out on >>>>>>>>>>>>>>>>> [ActorSelection[Anchor(akka://opendaylight-cluster-data/), >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Path(/user/shardmanager-operational/member-1-shard-inventory-operational#-1518836725)]] >>>>>>>>>>>>>>>>> after [30000 ms] >>>>>>>>>>>>>>>> And it goes for a while. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Do you have any input on the same? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Could you give some advice to be able to scale? (I know >>>>>>>>>>>>>>>> disabling StatisticManager can help for instance) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Am I doing something wrong? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I can provide any asked information regarding the issue I’m >>>>>>>>>>>>>>>> facing. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Alexis >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> openflowplugin-dev mailing list >>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> openflowplugin-dev mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> openflowplugin-dev mailing list >>>>>> [email protected] >>>>>> <mailto:[email protected]> >>>>>> <mailto:[email protected]> >>>>>> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev >>>>>> >>>>>> >>>>> >>> >> _______________________________________________ >> openflowplugin-dev mailing list >> [email protected] >> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev >
_______________________________________________ openflowplugin-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
