If the same test worked 6-8 months ago this seems like a bug, but please feel free to open it whenever you are sure.
> On Feb 18, 2016, at 11:45 AM, Alexis de Talhouët <[email protected]> > wrote: > > Hello Luis, > > For sure I’m willing to open a bug but before I want to make sure there is a > bug and that I’m not doing something wrong. > In ODL’s infra, there is a test to find the maximum number of switches that > can be connected to ODL, and this test reach ~ 500 [0] > I was able to scale up to 1090 switches [1] using the CSIT job in the > sandbox. > I believe the CSIT test is different in a way that switches are emulated in > one mininet VM, whereas I’m connecting OVS instances from separate containers. > > 6-8 months ago, I was able to perform the same test, and scale with OVS > docker container up to ~400 before ODL start crashing (with some optimization > done behind the scene, i.e. ulimit, mem, cpu, GC…) > Now I’m not able to scale more than 100 with the same configuration. > > FYI: I just quickly look at the CSIT test [0] karaf.log, it seems the test is > actually failing but it is not correctly advertised… switch connection are > dropped. > Look for those: > 016-02-18 07:07:51,741 | WARN | entLoopGroup-6-6 | OFFrameDecoder > | 181 - org.opendaylight.openflowjava.openflow-protocol-impl - > 0.6.4.SNAPSHOT | Unexpected exception from downstream. > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)[:1.7.0_85] > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)[:1.7.0_85] > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)[:1.7.0_85] > at sun.nio.ch.IOUtil.read(IOUtil.java:192)[:1.7.0_85] > at > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)[:1.7.0_85] > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)[111:io.netty.buffer:4.0.26.Final] > at > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)[111:io.netty.buffer:4.0.26.Final] > at > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)[109:io.netty.transport:4.0.26.Final] > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)[109:io.netty.transport:4.0.26.Final] > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)[109:io.netty.transport:4.0.26.Final] > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)[109:io.netty.transport:4.0.26.Final] > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)[109:io.netty.transport:4.0.26.Final] > at > io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:349)[109:io.netty.transport:4.0.26.Final] > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)[110:io.netty.common:4.0.26.Final] > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[110:io.netty.common:4.0.26.Final] > at java.lang.Thread.run(Thread.java:745)[:1.7.0_85] > > > [0]: > https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-scalability-daily-only-stable-lithium/ > > <https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-scalability-daily-only-stable-lithium/> > [1]: https://git.opendaylight.org/gerrit/#/c/33213/ > <https://git.opendaylight.org/gerrit/#/c/33213/> > >> On Feb 18, 2016, at 2:28 PM, Luis Gomez <[email protected] >> <mailto:[email protected]>> wrote: >> >> Alexis, thanks very much for sharing this test. Would you mind to open a bug >> with all this info so we can track this? >> >> >>> On Feb 18, 2016, at 7:29 AM, Alexis de Talhouët <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi Michal, >>> >>> ODL memory is capped at 2go, the more memory I add, those more OVS I can >>> connect. Regarding CPU, it’s around 10-20% when connecting new OVS, with >>> some peak to 80%. >>> >>> After some investigation, here is what I observed: >>> Let say I have 50 switches connected, stat manager disabled. I have one >>> opened socket per switch, plus an additional one for the controller. >>> Then I connect a new switch (2016-02-18 09:35:08,059), 51 switches… >>> something is happening causing all connection to be dropped (by device?) >>> and then ODL >>> try to recreate them and goes in a crazy loop where it is never able to >>> re-establish communication, but keeps creating new sockets. >>> I’m suspecting something being garbage collected due to lack of memory, >>> although no OOM errors. >>> >>> Attached the YourKit Java Profiler analysis for the described scenario and >>> the logs [1]. >>> >>> Thanks, >>> Alexis >>> >>> [1]: >>> https://www.dropbox.com/sh/dgqeqv4j76zwbh3/AACim0za1fUozc7DlYJ4fsMJa?dl=0 >>> <https://www.dropbox.com/sh/dgqeqv4j76zwbh3/AACim0za1fUozc7DlYJ4fsMJa?dl=0> >>> >>>> On Feb 9, 2016, at 8:59 AM, Michal Rehak -X (mirehak - PANTHEON >>>> TECHNOLOGIES at Cisco) <[email protected] <mailto:[email protected]>> >>>> wrote: >>>> >>>> Hi Alexis, >>>> I am not sure how OVS uses threads - in changelog there is some >>>> concurrency related improvement in 2.1.3 and 2.3. >>>> Also I guess docker can be forced regarding assigned resources. >>>> >>>> For you the most important is the amount of cores used by controller. >>>> >>>> How does your cpu and memory consumption look like when you connect all >>>> the OVSs? >>>> >>>> Regards, >>>> Michal >>>> >>>> ________________________________________ >>>> From: Alexis de Talhouët <[email protected] >>>> <mailto:[email protected]>> >>>> Sent: Tuesday, February 9, 2016 14:44 >>>> To: Michal Rehak -X (mirehak - PANTHEON TECHNOLOGIES at Cisco) >>>> Cc: [email protected] >>>> <mailto:[email protected]> >>>> Subject: Re: [openflowplugin-dev] Scalability issues >>>> >>>> Hello Michal, >>>> >>>> Yes, all the OvS instances I’m running has a unique DPID. >>>> >>>> Regarding the thread limit for netty, I’m running test in a server that >>>> has 28 CPU(s). >>>> >>>> Does each OvS instances is assigned its own thread? >>>> >>>> Thanks, >>>> Alexis >>>> >>>> >>>>> On Feb 9, 2016, at 3:42 AM, Michal Rehak -X (mirehak - PANTHEON >>>>> TECHNOLOGIES at Cisco) <[email protected] <mailto:[email protected]>> >>>>> wrote: >>>>> >>>>> Hi Alexis, >>>>> in Li-design there is the stats manager not in form of standalone app but >>>>> as part of core of ofPlugin. You can disable it via rpc. >>>>> >>>>> Just a question regarding your ovs setup. Do you have all DPIDs unique? >>>>> >>>>> Also there is limit for netty in form of amount of used threads. By >>>>> default it uses 2 x cpu_cores_amount. You should have as many cores as >>>>> possible in order to get max performance. >>>>> >>>>> >>>>> >>>>> Regards, >>>>> Michal >>>>> >>>>> >>>>> >>>>> ________________________________________ >>>>> From: [email protected] >>>>> <mailto:[email protected]> >>>>> <[email protected] >>>>> <mailto:[email protected]>> on behalf of >>>>> Alexis de Talhouët <[email protected] >>>>> <mailto:[email protected]>> >>>>> Sent: Tuesday, February 9, 2016 00:45 >>>>> To: [email protected] >>>>> <mailto:[email protected]> >>>>> Subject: [openflowplugin-dev] Scalability issues >>>>> >>>>> Hello openflowplugin-dev, >>>>> >>>>> I’m currently running some scalability test against openflowplugin-li >>>>> plugin, stable/lithium. >>>>> Playing with CSIT job, I was able to connect up to 1090 switches: >>>>> https://git.opendaylight.org/gerrit/#/c/33213/ >>>>> <https://git.opendaylight.org/gerrit/#/c/33213/> >>>>> >>>>> I’m now running the test against 40 OvS switches, each one of them is in >>>>> a docker container. >>>>> >>>>> Connecting around 30 of them works fine, but then, adding a new one break >>>>> completely ODL, it goes crazy and unresponsible. >>>>> Attach a snippet of the karaf.log with log set to DEBUG for >>>>> org.opendaylight.openflowplugin, thus it’s a really big log (~2.5MB). >>>>> >>>>> Here it what I observed based on the log: >>>>> I have 30 switches connected, all works fine. Then I add a new one: >>>>> - SalRoleServiceImpl starts doing its thing (2016-02-08 23:13:38,534) >>>>> - RpcManagerImpl Registering Openflow RPCs (2016-02-08 23:13:38,546) >>>>> - ConnectionAdapterImpl Hello received (2016-02-08 23:13:40,520) >>>>> - Creation of the transaction chain, … >>>>> >>>>> Then all starts failing apart with this log: >>>>>> 2016-02-08 23:13:50,021 | DEBUG | ntLoopGroup-11-9 | >>>>>> ConnectionContextImpl | 190 - >>>>>> org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | disconnecting: >>>>>> node=/172.31.100.9:46736|auxId=0|connection state = RIP >>>>> End then ConnectionContextImpl disconnects one by one the switches, >>>>> RpcManagerImpl is unregistered >>>>> Then it goes crazy for a while. >>>>> But all I’ve done is adding a new switch.. >>>>> >>>>> Finally, at 2016-02-08 23:14:26,666, exceptions are thrown: >>>>>> 2016-02-08 23:14:26,666 | ERROR | lt-dispatcher-85 | >>>>>> LocalThreePhaseCommitCohort | 172 - >>>>>> org.opendaylight.controller.sal-distributed-datastore - 1.2.4.SNAPSHOT | >>>>>> Failed to prepare transaction member-1-chn-5-txn-180 on backend >>>>>> akka.pattern.AskTimeoutException: Ask timed out on >>>>>> [ActorSelection[Anchor(akka://opendaylight-cluster-data/ >>>>>> <akka://opendaylight-cluster-data/>), >>>>>> Path(/user/shardmanager-operational/member-1-shard-inventory-operational#-1518836725)]] >>>>>> after [30000 ms] >>>>> And it goes for a while. >>>>> >>>>> Do you have any input on the same? >>>>> >>>>> Could you give some advice to be able to scale? (I know disabling >>>>> StatisticManager can help for instance) >>>>> >>>>> Am I doing something wrong? >>>>> >>>>> I can provide any asked information regarding the issue I’m facing. >>>>> >>>>> Thanks, >>>>> Alexis >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> openflowplugin-dev mailing list >>> [email protected] >>> <mailto:[email protected]> >>> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev >>> <https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev> >> >
_______________________________________________ openflowplugin-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
