Hello Shuva, I’m using stable/lithium lithium version. As of my scenario, it’s a single node, not a cluster. And yes I’m installing 2 flows per switch.
Thanks, Alexis > On Mar 3, 2016, at 2:58 PM, Shuva Jyoti Kar <[email protected]> > wrote: > > Hi Alexis, > > I understand that you are using the lithium model of the ofplugin, am I > correct? Also is it in a clustered environment or a single node setup ? > Did you try installing some flows into each of the switches to check how they > behave. > > Thanks > Shuva > > Date: Wed, 2 Mar 2016 21:41:09 -0800 > From: Jamo Luhrsen <[email protected] <mailto:[email protected]>> > To: Alexis de Talhou?t <[email protected] > <mailto:[email protected]>>, Abhijit Kumbhare > <[email protected] <mailto:[email protected]>> > Cc: "[email protected] > <mailto:[email protected]>" > <[email protected] > <mailto:[email protected]>> > Subject: Re: [openflowplugin-dev] Scalability issues > Message-ID: <[email protected] <mailto:[email protected]>> > Content-Type: text/plain; charset=utf-8 > > > > On 02/19/2016 02:10 PM, Alexis de Talhou?t wrote: > > So far my results are: > > > > OVS 2.4.0: ODL configure with 2G of mem ?> max is ~50 switches > > connected OVS 2.3.1: ODL configure with 256MG of mem ?> I currently have > > 150 switches connected, can?t scale more due to infra limits. > > Alexis, I think this is probably worth putting a bugzilla up. > > How much horsepower do you need per docker ovs instance? We need to get this > automated in CSIT. Marcus from ovsdb wants to do similar tests with ovsdb. > > JamO > > > > I will pursue me testing next week. > > > > Thanks, > > Alexis > > > >> On Feb 19, 2016, at 5:06 PM, Abhijit Kumbhare <[email protected] > >> <mailto:[email protected] > >> <mailto:[email protected]%20%3cmailto:[email protected]>>> wrote: > >> > >> Interesting. I wonder - why that would be? > >> > >> On Fri, Feb 19, 2016 at 1:19 PM, Alexis de Talhou?t > >> <[email protected] <mailto:[email protected] > >> <mailto:[email protected]%20%3cmailto:[email protected]>>> > >> wrote: > >> > >> OVS 2.3.x scales fine > >> OVS 2.4.x doesn?t scale well. > >> > >> Here is also the docker file for ovs 2.4.1 > >> > >> > >> > >>> On Feb 19, 2016, at 11:20 AM, Alexis de Talhou?t > >>> <[email protected] <mailto:[email protected] > >>> <mailto:[email protected]%20%3cmailto:[email protected]>>> > >>> wrote: > >>> > >>>> can I use your containers? do you have any scripts/tools to bring > >>>> things up/down? > >>> > >>> Sure, attached a tar file containing all scripts / config / > >>> dockerfile I?m using to setup docker containers > >>> emulating OvS. > >>> FYI: it?s ovs 2.3.0 and not 2.4.0 anymore > >>> > >>> Also, forget about this whole mail thread, something in my private > >>> container must be breaking OVS behaviour, I > >>> don?t know what yet. > >>> > >>> With the docker file attached here, I can scale 90+ without any > >>> trouble... > >>> > >>> Thanks, > >>> Alexis > >>> > >>> <ovs_scalability_setup.tar.gz> > >>> > >>>> On Feb 18, 2016, at 6:07 PM, Jamo Luhrsen <[email protected] > >>>> <mailto:[email protected] > >>>> <mailto:[email protected]%20%3cmailto:[email protected]>>> wrote: > >>>> > >>>> inline... > >>>> > >>>> On 02/18/2016 02:58 PM, Alexis de Talhou?t wrote: > >>>>> I?m running OVS 2.4, against stable/lithium, openflowplugin-li > >>>> > >>>> > >>>> so this is one difference between CSIT and your setup, in addition > >>>> to the whole > >>>> containers vs mininet. > >>>> > >>>>> I never scaled up to 1k, this was in the CSIT job. > >>>>> In a real scenario, I scaled to ~400. But it was even before > >>>>> clustering came into play in ofp lithium. > >>>>> > >>>>> I think the log I sent have log trace for openflowplugin and > >>>>> openflowjava, it not the case I could resubmit the > >>>>> logs. > >>>>> I removed some of them in openflowjava because it was way to chatty > >>>>> (logging all messages content between ovs > >>>>> <---> odl) > >>>>> > >>>>> Unfortunately those IOException happen after the whole thing blow > >>>>> up. I was able to narrow done some logs in > >>>>> openflowjava > >>>>> to see the first disconnected event. As mentioned in a previous > >>>>> mail (in this mail thread) it?s the device that is > >>>>> issuing the disconnect: > >>>>> > >>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | > >>>>>> OFFrameDecoder | 201 - > >>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - > >>>>>> 0.6.4.SNAPSHOT | skipping bytebuf - too few bytes for > >>>>>> header: 0 < 8 > >>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | > >>>>>> OFVersionDetector | 201 - > >>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - > >>>>>> 0.6.4.SNAPSHOT | not enough data > >>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | > >>>>>> DelegatingInboundHandler | 201 - > >>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - > >>>>>> 0.6.4.SNAPSHOT | Channel inactive > >>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | > >>>>>> ConnectionAdapterImpl | 201 - > >>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - > >>>>>> 0.6.4.SNAPSHOT | ConsumeIntern msg on [id: 0x1efab5fb, > >>>>>> /172.18.0.49:36983 <http://172.18.0.49:36983/ > >>>>>> <http://172.18.0.49:36983/>> :> /192.168.1.159:6633 > >>>>>> <http://192.168.1.159:6633/ <http://192.168.1.159:6633/>>] > >>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | > >>>>>> ConnectionAdapterImpl | 201 - > >>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - > >>>>>> 0.6.4.SNAPSHOT | ConsumeIntern msg - DisconnectEvent > >>>>>> 2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | > >>>>>> ConnectionContextImpl | 205 - > >>>>>> org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | > >>>>>> disconnecting: node=/172.18.0.49:36983|auxId=0|connection > >>>>>> state = RIP > >>>>> > >>>>> Those logs come from another run, so are not in the logs I sent > >>>>> earlier. Although the behaviour is always the same. > >>>>> > >>>>> Regarding the memory, I don?t want to add more than 2G memory, > >>>>> because, and I tested it, the more memory I add, > >>>>> the more > >>>>> I can scale. But as you pointed out, > >>>>> this issue is not OOM error. Thus I rather like failing at 2G (less > >>>>> docker containers to spawn each run ~50). > >>>> > >>>> so, maybe reduce your memory then to simplify the reproducing steps. > >>>> Since you know that increasing > >>>> memory allows you to scale further, but still hit the problem; let's > >>>> make it easier to hit. how far > >>>> can you go with the max mem set to 500M? if you are only loading > >>>> ofp-li. > >>>> > >>>>> I definitely need some help here, because I can?t sort myself out > >>>>> in the openflowplugin + openflowjava codebase? > >>>>> But I believe I already have Michal?s attention :) > >>>> > >>>> can I use your containers? do you have any scripts/tools to bring > >>>> things up/down? > >>>> I might be able to try and reproduce myself. I like breaking > >>>> things :) > >>>> > >>>> JamO > >>>> > >>>> > >>>>> > >>>>> Thanks, > >>>>> Alexis > >>>>> > >>>>> > >>>>>> On Feb 18, 2016, at 5:44 PM, Jamo Luhrsen <[email protected] > >>>>>> <mailto:[email protected]> > >>>>>> <mailto:[email protected] <mailto:[email protected]>> > >>>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote: > >>>>>> > >>>>>> Alexis, don't worry about filing a bug just to give us a common > >>>>>> place to work/comment, even > >>>>>> if we close it later because of something outside of ODL. Email > >>>>>> is fine too. > >>>>>> > >>>>>> what ovs version do you have in your containers? this test sounds > >>>>>> great. > >>>>>> > >>>>>> Luis is right, that if you were scaling well past 1k in the past, > >>>>>> but now it falls over at > >>>>>> 50 it sounds like a bug. > >>>>>> > >>>>>> Oh, you can try increasing the jvm max_mem from default of 2G just > >>>>>> as a data point. The > >>>>>> fact that you don't get OOMs makes me think memory might not be > >>>>>> the final bottleneck. > >>>>>> > >>>>>> you could enable debug/trace logs in the right modules (need ofp > >>>>>> devs to tell us that) > >>>>>> for a little more info. > >>>>>> > >>>>>> I've seen those IOExceptions before and always assumed it was from > >>>>>> an OF switch doing a > >>>>>> hard RST on it's connection. > >>>>>> > >>>>>> > >>>>>> Thanks, > >>>>>> JamO > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 02/18/2016 11:48 AM, Luis Gomez wrote: > >>>>>>> If the same test worked 6-8 months ago this seems like a bug, but > >>>>>>> please feel free to open it whenever you > >>>>>>> are sure. > >>>>>>> > >>>>>>>> On Feb 18, 2016, at 11:45 AM, Alexis de Talhou?t > >>>>>>>> <[email protected] <mailto:[email protected] > >>>>>>>> <mailto:[email protected]%20%3cmailto:[email protected]>> > >>>>>>>> <mailto:[email protected] > >>>>>>>> <mailto:[email protected]>> <mailto:[email protected] > >>>>>>>> <mailto:[email protected]>>> wrote: > >>>>>>>> > >>>>>>>> Hello Luis, > >>>>>>>> > >>>>>>>> For sure I?m willing to open a bug but before I want to make > >>>>>>>> sure there is a bug and that I?m not doing > >>>>>>>> something wrong. > >>>>>>>> In ODL?s infra, there is a test to find the maximum number of > >>>>>>>> switches that can be connected to ODL, and > >>>>>>>> this test > >>>>>>>> reach ~ 500 [0] > >>>>>>>> I was able to scale up to 1090 switches [1] using the CSIT job > >>>>>>>> in the sandbox. > >>>>>>>> I believe the CSIT test is different in a way that switches are > >>>>>>>> emulated in one mininet VM, whereas I?m > >>>>>>>> connecting OVS > >>>>>>>> instances from separate containers. > >>>>>>>> > >>>>>>>> 6-8 months ago, I was able to perform the same test, and scale > >>>>>>>> with OVS docker container up to ~400 before > >>>>>>>> ODL start > >>>>>>>> crashing (with some optimization done behind the scene, i.e. > >>>>>>>> ulimit, mem, cpu, GC?) > >>>>>>>> Now I?m not able to scale more than 100 with the same > >>>>>>>> configuration. > >>>>>>>> > >>>>>>>> FYI: I just quickly look at the CSIT test [0] karaf.log, it > >>>>>>>> seems the test is actually failing but it is not > >>>>>>>> correctly > >>>>>>>> advertised? switch connection are dropped. > >>>>>>>> Look for those: > >>>>>>>> 016-02-18 07:07:51,741 | WARN | entLoopGroup-6-6 | > >>>>>>>> OFFrameDecoder | 181 - > >>>>>>>> org.opendaylight.openflowjava.openflow-protocol-impl - > >>>>>>>> 0.6.4.SNAPSHOT | Unexpected exception from downstream. > >>>>>>>> java.io.IOException: Connection reset by peer > >>>>>>>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)[:1.7.0_85] > >>>>>>>> at > >>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)[:1.7.0_85] > >>>>>>>> at > >>>>>>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)[:1.7.0_85] > >>>>>>>> at sun.nio.ch.IOUtil.read(IOUtil.java:192)[:1.7.0_85] > >>>>>>>> at > >>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)[:1.7.0_85] > >>>>>>>> at > >>>>>>>> > >>>>>>>> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)[111:io.netty.buffer:4.0.26.Final] > >>>>>>>> at > >>>>>>>> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)[111:io.netty.buffer:4.0.26.Final] > >>>>>>>> at > >>>>>>>> > >>>>>>>> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)[109:io.netty.transport:4.0.26.Final] > >>>>>>>> at > >>>>>>>> > >>>>>>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)[109:io.netty.transport:4.0.26.Final] > >>>>>>>> at > >>>>>>>> > >>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)[109:io.netty.transport:4.0.26.Final] > >>>>>>>> at > >>>>>>>> > >>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)[109:io.netty.transport:4.0.26.Final] > >>>>>>>> at > >>>>>>>> > >>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)[109:io.netty.transport:4.0.26.Final] > >>>>>>>> at > >>>>>>>> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:349)[109:io.netty.transport:4.0.26.Final] > >>>>>>>> at > >>>>>>>> > >>>>>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)[110:io.netty.common:4.0.26.Final] > >>>>>>>> at > >>>>>>>> > >>>>>>>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[110:io.netty.common:4.0.26.Final] > >>>>>>>> at java.lang.Thread.run(Thread.java:745)[:1.7.0_85] > >>>>>>>> > >>>>>>>> > >>>>>>>> [0]: > >>>>>>>> https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-scalability-daily-only-stable-lithium/ > >>>>>>>> > >>>>>>>> <https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-scalability-daily-only-stable-lithium/> > >>>>>>>> [1]: https://git.opendaylight.org/gerrit/#/c/33213/ > >>>>>>>> <https://git.opendaylight.org/gerrit/#/c/33213/> > >>>>>>>> > >>>>>>>>> On Feb 18, 2016, at 2:28 PM, Luis Gomez <[email protected] > >>>>>>>>> <mailto:[email protected]> > >>>>>>>>> <mailto:[email protected] <mailto:[email protected]>> > >>>>>>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote: > >>>>>>>>> > >>>>>>>>> Alexis, thanks very much for sharing this test. Would you mind > >>>>>>>>> to open a bug with all this info so we can > >>>>>>>>> track this? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> On Feb 18, 2016, at 7:29 AM, Alexis de Talhou?t > >>>>>>>>>> <[email protected] <mailto:[email protected]> > >>>>>>>>>> <mailto:[email protected] > >>>>>>>>>> <mailto:[email protected]>> <mailto:[email protected] > >>>>>>>>>> <mailto:[email protected]>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Hi Michal, > >>>>>>>>>> > >>>>>>>>>> ODL memory is capped at 2go, the more memory I add, those more > >>>>>>>>>> OVS I can connect. Regarding CPU, it?s > >>>>>>>>>> around 10-20% > >>>>>>>>>> when connecting new OVS, with some peak to 80%. > >>>>>>>>>> > >>>>>>>>>> After some investigation, here is what I observed: > >>>>>>>>>> Let say I have 50 switches connected, stat manager disabled. I > >>>>>>>>>> have one opened socket per switch, plus an > >>>>>>>>>> additional > >>>>>>>>>> one for the controller. > >>>>>>>>>> Then I connect a new switch (2016-02-18 09:35:08,059), 51 > >>>>>>>>>> switches? something is happening causing all > >>>>>>>>>> connection to > >>>>>>>>>> be dropped (by device?) and then ODL > >>>>>>>>>> try to recreate them and goes in a crazy loop where it is > >>>>>>>>>> never able to re-establish communication, but keeps > >>>>>>>>>> creating new sockets. > >>>>>>>>>> I?m suspecting something being garbage collected due to lack > >>>>>>>>>> of memory, although no OOM errors. > >>>>>>>>>> > >>>>>>>>>> Attached the YourKit Java Profiler analysis for the described > >>>>>>>>>> scenario and the logs [1]. > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> Alexis > >>>>>>>>>> > >>>>>>>>>> [1]: > >>>>>>>>>> https://www.dropbox.com/sh/dgqeqv4j76zwbh3/AACim0za1fUozc7DlY > >>>>>>>>>> <https://www.dropbox.com/sh/dgqeqv4j76zwbh3/AACim0za1fUozc7DlY> > >>>>>>>>>> J4fsMJa?dl=0 > >>>>>>>>>> > >>>>>>>>>>> On Feb 9, 2016, at 8:59 AM, Michal Rehak -X (mirehak - > >>>>>>>>>>> PANTHEON TECHNOLOGIES at Cisco) <[email protected] > >>>>>>>>>>> <mailto:[email protected]> > >>>>>>>>>>> <mailto:[email protected] <mailto:[email protected]>> > >>>>>>>>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi Alexis, > >>>>>>>>>>> I am not sure how OVS uses threads - in changelog there is > >>>>>>>>>>> some concurrency related improvement in 2.1.3 > >>>>>>>>>>> and 2.3. > >>>>>>>>>>> Also I guess docker can be forced regarding assigned > >>>>>>>>>>> resources. > >>>>>>>>>>> > >>>>>>>>>>> For you the most important is the amount of cores used by > >>>>>>>>>>> controller. > >>>>>>>>>>> > >>>>>>>>>>> How does your cpu and memory consumption look like when you > >>>>>>>>>>> connect all the OVSs? > >>>>>>>>>>> > >>>>>>>>>>> Regards, > >>>>>>>>>>> Michal > >>>>>>>>>>> > >>>>>>>>>>> ________________________________________ > >>>>>>>>>>> From: Alexis de Talhou?t <[email protected] > >>>>>>>>>>> <mailto:[email protected]> > >>>>>>>>>>> <mailto:[email protected] > >>>>>>>>>>> <mailto:[email protected]>> <mailto:[email protected] > >>>>>>>>>>> <mailto:[email protected]>>> > >>>>>>>>>>> Sent: Tuesday, February 9, 2016 14:44 > >>>>>>>>>>> To: Michal Rehak -X (mirehak - PANTHEON TECHNOLOGIES at Cisco) > >>>>>>>>>>> Cc: [email protected] > >>>>>>>>>>> <mailto:[email protected]> > >>>>>>>>>>> <mailto:[email protected] > >>>>>>>>>>> <mailto:[email protected]>> > >>>>>>>>>>> <mailto:[email protected] > >>>>>>>>>>> <mailto:[email protected]>> > >>>>>>>>>>> Subject: Re: [openflowplugin-dev] Scalability issues > >>>>>>>>>>> > >>>>>>>>>>> Hello Michal, > >>>>>>>>>>> > >>>>>>>>>>> Yes, all the OvS instances I?m running has a unique DPID. > >>>>>>>>>>> > >>>>>>>>>>> Regarding the thread limit for netty, I?m running test in a > >>>>>>>>>>> server that has 28 CPU(s). > >>>>>>>>>>> > >>>>>>>>>>> Does each OvS instances is assigned its own thread? > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Alexis > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> On Feb 9, 2016, at 3:42 AM, Michal Rehak -X (mirehak - > >>>>>>>>>>>> PANTHEON TECHNOLOGIES at Cisco) > >>>>>>>>>>>> <[email protected] <mailto:[email protected] > >>>>>>>>>>>> <mailto:[email protected]%20%3cmailto:[email protected]>> > >>>>>>>>>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Hi Alexis, > >>>>>>>>>>>> in Li-design there is the stats manager not in form of > >>>>>>>>>>>> standalone app but as part of core of ofPlugin. > >>>>>>>>>>>> You can > >>>>>>>>>>>> disable it via rpc. > >>>>>>>>>>>> > >>>>>>>>>>>> Just a question regarding your ovs setup. Do you have all > >>>>>>>>>>>> DPIDs unique? > >>>>>>>>>>>> > >>>>>>>>>>>> Also there is limit for netty in form of amount of used > >>>>>>>>>>>> threads. By default it uses 2 x > >>>>>>>>>>>> cpu_cores_amount. You > >>>>>>>>>>>> should have as many cores as possible in order to get max > >>>>>>>>>>>> performance. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Regards, > >>>>>>>>>>>> Michal > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> ________________________________________ > >>>>>>>>>>>> From: [email protected] > >>>>>>>>>>>> <mailto:[email protected]> > >>>>>>>>>>>> <mailto:[email protected] > >>>>>>>>>>>> <mailto:[email protected]>> > >>>>>>>>>>>> <mailto:[email protected] > >>>>>>>>>>>> <mailto:[email protected]>> > >>>>>>>>>>>> <[email protected] > >>>>>>>>>>>> <mailto:[email protected]> > >>>>>>>>>>>> <mailto:[email protected] > >>>>>>>>>>>> <mailto:[email protected]>> > >>>>>>>>>>>> <mailto:[email protected] > >>>>>>>>>>>> <mailto:[email protected]>>> > >>>>>>>>>>>> on > >>>>>>>>>>>> behalf of Alexis de Talhou?t <[email protected] > >>>>>>>>>>>> <mailto:[email protected]> > >>>>>>>>>>>> <mailto:[email protected] > >>>>>>>>>>>> <mailto:[email protected]>> > >>>>>>>>>>>> <mailto:[email protected] > >>>>>>>>>>>> <mailto:[email protected]>>> > >>>>>>>>>>>> Sent: Tuesday, February 9, 2016 00:45 > >>>>>>>>>>>> To: [email protected] > >>>>>>>>>>>> <mailto:[email protected]> > >>>>>>>>>>>> <mailto:[email protected] > >>>>>>>>>>>> <mailto:[email protected]>> > >>>>>>>>>>>> <mailto:[email protected] > >>>>>>>>>>>> <mailto:[email protected]>> > >>>>>>>>>>>> Subject: [openflowplugin-dev] Scalability issues > >>>>>>>>>>>> > >>>>>>>>>>>> Hello openflowplugin-dev, > >>>>>>>>>>>> > >>>>>>>>>>>> I?m currently running some scalability test against > >>>>>>>>>>>> openflowplugin-li plugin, stable/lithium. > >>>>>>>>>>>> Playing with CSIT job, I was able to connect up to 1090 > >>>>>>>>>>>> switches: > >>>>>>>>>>>> https://git.opendaylight.org/gerrit/#/c/33213/ > >>>>>>>>>>>> <https://git.opendaylight.org/gerrit/#/c/33213/> > >>>>>>>>>>>> > >>>>>>>>>>>> I?m now running the test against 40 OvS switches, each one > >>>>>>>>>>>> of them is in a docker container. > >>>>>>>>>>>> > >>>>>>>>>>>> Connecting around 30 of them works fine, but then, adding a > >>>>>>>>>>>> new one break completely ODL, it goes crazy and > >>>>>>>>>>>> unresponsible. > >>>>>>>>>>>> Attach a snippet of the karaf.log with log set to DEBUG for > >>>>>>>>>>>> org.opendaylight.openflowplugin, thus it?s a > >>>>>>>>>>>> really > >>>>>>>>>>>> big log (~2.5MB). > >>>>>>>>>>>> > >>>>>>>>>>>> Here it what I observed based on the log: > >>>>>>>>>>>> I have 30 switches connected, all works fine. Then I add a > >>>>>>>>>>>> new one: > >>>>>>>>>>>> - SalRoleServiceImpl starts doing its thing (2016-02-08 > >>>>>>>>>>>> 23:13:38,534) > >>>>>>>>>>>> - RpcManagerImpl Registering Openflow RPCs (2016-02-08 > >>>>>>>>>>>> 23:13:38,546) > >>>>>>>>>>>> - ConnectionAdapterImpl Hello received (2016-02-08 > >>>>>>>>>>>> 23:13:40,520) > >>>>>>>>>>>> - Creation of the transaction chain, ? > >>>>>>>>>>>> > >>>>>>>>>>>> Then all starts failing apart with this log: > >>>>>>>>>>>>> 2016-02-08 23:13:50,021 | DEBUG | ntLoopGroup-11-9 | > >>>>>>>>>>>>> ConnectionContextImpl | 190 - > >>>>>>>>>>>>> org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | > >>>>>>>>>>>>> disconnecting: > >>>>>>>>>>>>> node=/172.31.100.9:46736|auxId=0|connection state = > >>>>>>>>>>>>> RIP > >>>>>>>>>>>> End then ConnectionContextImpl disconnects one by one the > >>>>>>>>>>>> switches, RpcManagerImpl is unregistered > >>>>>>>>>>>> Then it goes crazy for a while. > >>>>>>>>>>>> But all I?ve done is adding a new switch.. > >>>>>>>>>>>> > >>>>>>>>>>>> Finally, at 2016-02-08 23:14:26,666, exceptions are thrown: > >>>>>>>>>>>>> 2016-02-08 23:14:26,666 | ERROR | lt-dispatcher-85 | > >>>>>>>>>>>>> LocalThreePhaseCommitCohort | 172 - > >>>>>>>>>>>>> org.opendaylight.controller.sal-distributed-datastore - > >>>>>>>>>>>>> 1.2.4.SNAPSHOT | Failed to prepare transaction > >>>>>>>>>>>>> member-1-chn-5-txn-180 on backend > >>>>>>>>>>>>> akka.pattern.AskTimeoutException: Ask timed out on > >>>>>>>>>>>>> [ActorSelection[Anchor(akka://opendaylight-cluster-data/ > >>>>>>>>>>>>> <akka://opendaylight-cluster-data/>), > >>>>>>>>>>>>> > >>>>>>>>>>>>> Path(/user/shardmanager-operational/member-1-shard-invento > >>>>>>>>>>>>> ry-operational#-1518836725)]] after [30000 ms] > >>>>>>>>>>>> And it goes for a while. > >>>>>>>>>>>> > >>>>>>>>>>>> Do you have any input on the same? > >>>>>>>>>>>> > >>>>>>>>>>>> Could you give some advice to be able to scale? (I know > >>>>>>>>>>>> disabling StatisticManager can help for instance) > >>>>>>>>>>>> > >>>>>>>>>>>> Am I doing something wrong? > >>>>>>>>>>>> > >>>>>>>>>>>> I can provide any asked information regarding the issue I?m > >>>>>>>>>>>> facing. > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Alexis > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> openflowplugin-dev mailing list > >>>>>>>>>> [email protected] > >>>>>>>>>> <mailto:[email protected]> > >>>>>>>>>> <mailto:[email protected] > >>>>>>>>>> <mailto:[email protected]>> > >>>>>>>>>> <mailto:[email protected] > >>>>>>>>>> <mailto:[email protected]>> > >>>>>>>>>> > >>>>>>>>>> https://lists.opendaylight.org/mailman/listinfo/openflowplugi > >>>>>>>>>> <https://lists.opendaylight.org/mailman/listinfo/openflowplugi> > >>>>>>>>>> n-dev > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> openflowplugin-dev mailing list > >>>>>>> [email protected] > >>>>>>> <mailto:[email protected]> > >>>>>>> <mailto:[email protected] > >>>>>>> <mailto:[email protected]>> > >>>>>>> <mailto:[email protected] > >>>>>>> <mailto:[email protected]>> > >>>>>>> > >>>>>>> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-d > >>>>>>> <https://lists.opendaylight.org/mailman/listinfo/openflowplugin-d> > >>>>>>> ev > >>> > >> > >> > >> _______________________________________________ > >> openflowplugin-dev mailing list > >> [email protected] > >> <mailto:[email protected]> > >> <mailto:[email protected] > >> <mailto:[email protected]>> > >> > >> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev > >> <https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev> > >> > >> > >
_______________________________________________ openflowplugin-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
