Re: [openflowplugin-dev] Scalability issues

Alexis de Talhouët Thu, 03 Mar 2016 12:07:58 -0800

Hello Shuva,

I’m using stable/lithium lithium version.
As of my scenario, it’s a single node, not a cluster. And yes I’m installing 2 
flows per switch.


Thanks,
Alexis
> On Mar 3, 2016, at 2:58 PM, Shuva Jyoti Kar <[email protected]> 
> wrote:
> 
> Hi Alexis,
>  
> I understand that you are using the lithium model of the ofplugin, am I 
> correct? Also is it in a clustered environment or a single node setup ?
> Did you try installing some flows into each of the switches to check how they 
> behave.
>  
> Thanks
> Shuva
>  
> Date: Wed, 2 Mar 2016 21:41:09 -0800
> From: Jamo Luhrsen <[email protected] <mailto:[email protected]>>
> To: Alexis de Talhou?t <[email protected] 
> <mailto:[email protected]>>,     Abhijit Kumbhare
>                 <[email protected] <mailto:[email protected]>>
> Cc: "[email protected] 
> <mailto:[email protected]>"
>                 <[email protected] 
> <mailto:[email protected]>>
> Subject: Re: [openflowplugin-dev] Scalability issues
> Message-ID: <[email protected] <mailto:[email protected]>>
> Content-Type: text/plain; charset=utf-8
>  
>  
>  
> On 02/19/2016 02:10 PM, Alexis de Talhou?t wrote:
> > So far my results are:
> > 
> > OVS 2.4.0: ODL configure with 2G of mem ?> max is ~50 switches
> > connected OVS 2.3.1: ODL configure with 256MG of mem ?> I currently have 
> > 150 switches connected, can?t scale more due to infra limits.
>  
> Alexis, I think this is probably worth putting a bugzilla up.
>  
> How much horsepower do you need per docker ovs instance?  We need to get this 
> automated in CSIT.  Marcus from ovsdb wants to do similar tests with ovsdb.
>  
> JamO
>  
>  
> > I will pursue me testing next week.
> > 
> > Thanks,
> > Alexis
> > 
> >> On Feb 19, 2016, at 5:06 PM, Abhijit Kumbhare <[email protected] 
> >> <mailto:[email protected] 
> >> <mailto:[email protected]%20%3cmailto:[email protected]>>> wrote:
> >> 
> >> Interesting. I wonder - why that would be?
> >> 
> >> On Fri, Feb 19, 2016 at 1:19 PM, Alexis de Talhou?t 
> >> <[email protected] <mailto:[email protected] 
> >> <mailto:[email protected]%20%3cmailto:[email protected]>>> 
> >> wrote:
> >> 
> >>     OVS 2.3.x scales fine
> >>     OVS 2.4.x doesn?t scale well.
> >> 
> >>     Here is also the docker file for ovs 2.4.1
> >> 
> >> 
> >> 
> >>>     On Feb 19, 2016, at 11:20 AM, Alexis de Talhou?t 
> >>> <[email protected] <mailto:[email protected] 
> >>> <mailto:[email protected]%20%3cmailto:[email protected]>>> 
> >>> wrote:
> >>> 
> >>>>     can I use your containers?  do you have any scripts/tools to bring 
> >>>> things up/down?
> >>> 
> >>>     Sure, attached a tar file containing all scripts / config / 
> >>> dockerfile I?m using to setup docker containers
> >>>     emulating OvS.
> >>>     FYI: it?s ovs 2.3.0 and not 2.4.0 anymore
> >>> 
> >>>     Also, forget about this whole mail thread, something in my private 
> >>> container must be breaking OVS behaviour, I
> >>>     don?t know what yet.
> >>> 
> >>>     With the docker file attached here, I can scale 90+ without any 
> >>> trouble...
> >>> 
> >>>     Thanks,
> >>>     Alexis
> >>> 
> >>>     <ovs_scalability_setup.tar.gz>
> >>> 
> >>>>     On Feb 18, 2016, at 6:07 PM, Jamo Luhrsen <[email protected] 
> >>>> <mailto:[email protected] 
> >>>> <mailto:[email protected]%20%3cmailto:[email protected]>>> wrote:
> >>>> 
> >>>>     inline...
> >>>> 
> >>>>     On 02/18/2016 02:58 PM, Alexis de Talhou?t wrote:
> >>>>>     I?m running OVS 2.4, against stable/lithium, openflowplugin-li
> >>>> 
> >>>> 
> >>>>     so this is one difference between CSIT and your setup, in addition 
> >>>> to the whole
> >>>>     containers vs mininet.
> >>>> 
> >>>>>     I never scaled up to 1k, this was in the CSIT job.
> >>>>>     In a real scenario, I scaled to ~400. But it was even before 
> >>>>> clustering came into play in ofp lithium.
> >>>>> 
> >>>>>     I think the log I sent have log trace for openflowplugin and 
> >>>>> openflowjava, it not the case I could resubmit the
> >>>>>     logs.
> >>>>>     I removed some of them in openflowjava because it was way to chatty 
> >>>>> (logging all messages content between ovs
> >>>>>     <---> odl)
> >>>>> 
> >>>>>     Unfortunately those IOException happen after the whole thing blow 
> >>>>> up. I was able to narrow done some logs in
> >>>>>     openflowjava
> >>>>>     to see the first disconnected event. As mentioned in a previous 
> >>>>> mail (in this mail thread) it?s the device that is
> >>>>>     issuing the disconnect:
> >>>>> 
> >>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
> >>>>>> OFFrameDecoder                   | 201 -
> >>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
> >>>>>> 0.6.4.SNAPSHOT | skipping bytebuf - too few bytes for
> >>>>>>     header: 0 < 8
> >>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
> >>>>>> OFVersionDetector                | 201 -
> >>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
> >>>>>> 0.6.4.SNAPSHOT | not enough data
> >>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
> >>>>>> DelegatingInboundHandler         | 201 -
> >>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
> >>>>>> 0.6.4.SNAPSHOT | Channel inactive
> >>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
> >>>>>> ConnectionAdapterImpl            | 201 -
> >>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
> >>>>>> 0.6.4.SNAPSHOT | ConsumeIntern msg on [id: 0x1efab5fb,
> >>>>>>     /172.18.0.49:36983 <http://172.18.0.49:36983/ 
> >>>>>> <http://172.18.0.49:36983/>> :> /192.168.1.159:6633 
> >>>>>> <http://192.168.1.159:6633/ <http://192.168.1.159:6633/>>]
> >>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
> >>>>>> ConnectionAdapterImpl            | 201 -
> >>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
> >>>>>> 0.6.4.SNAPSHOT | ConsumeIntern msg - DisconnectEvent
> >>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
> >>>>>> ConnectionContextImpl            | 205 -
> >>>>>>     org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | 
> >>>>>> disconnecting: node=/172.18.0.49:36983|auxId=0|connection
> >>>>>>     state = RIP
> >>>>> 
> >>>>>     Those logs come from another run, so are not in the logs I sent 
> >>>>> earlier. Although the behaviour is always the same.
> >>>>> 
> >>>>>     Regarding the memory, I don?t want to add more than 2G memory, 
> >>>>> because, and I tested it, the more memory I add,
> >>>>>     the more
> >>>>>     I can scale. But as you pointed out, 
> >>>>>     this issue is not OOM error. Thus I rather like failing at 2G (less 
> >>>>> docker containers to spawn each run ~50).
> >>>> 
> >>>>     so, maybe reduce your memory then to simplify the reproducing steps. 
> >>>>  Since you know that increasing
> >>>>     memory allows you to scale further, but still hit the problem; let's 
> >>>> make it easier to hit.  how far
> >>>>     can you go with the max mem set to 500M?  if you are only loading 
> >>>> ofp-li.
> >>>> 
> >>>>>     I definitely need some help here, because I can?t sort myself out 
> >>>>> in the openflowplugin + openflowjava codebase?
> >>>>>     But I believe I already have Michal?s attention :)
> >>>> 
> >>>>     can I use your containers?  do you have any scripts/tools to bring 
> >>>> things up/down?
> >>>>     I might be able to try and reproduce myself.  I like breaking
> >>>> things :)
> >>>> 
> >>>>     JamO
> >>>> 
> >>>> 
> >>>>> 
> >>>>>     Thanks,
> >>>>>     Alexis
> >>>>> 
> >>>>> 
> >>>>>>     On Feb 18, 2016, at 5:44 PM, Jamo Luhrsen <[email protected] 
> >>>>>> <mailto:[email protected]>
> >>>>>>     <mailto:[email protected] <mailto:[email protected]>> 
> >>>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >>>>>> 
> >>>>>>     Alexis,  don't worry about filing a bug just to give us a common 
> >>>>>> place to work/comment, even
> >>>>>>     if we close it later because of something outside of ODL.  Email 
> >>>>>> is fine too.
> >>>>>> 
> >>>>>>     what ovs version do you have in your containers?  this test sounds 
> >>>>>> great.
> >>>>>> 
> >>>>>>     Luis is right, that if you were scaling well past 1k in the past, 
> >>>>>> but now it falls over at
> >>>>>>     50 it sounds like a bug.
> >>>>>> 
> >>>>>>     Oh, you can try increasing the jvm max_mem from default of 2G just 
> >>>>>> as a data point.  The
> >>>>>>     fact that you don't get OOMs makes me think memory might not be 
> >>>>>> the final bottleneck.
> >>>>>> 
> >>>>>>     you could enable debug/trace logs in the right modules (need ofp 
> >>>>>> devs to tell us that)
> >>>>>>     for a little more info.
> >>>>>> 
> >>>>>>     I've seen those IOExceptions before and always assumed it was from 
> >>>>>> an OF switch doing a
> >>>>>>     hard RST on it's connection.
> >>>>>> 
> >>>>>> 
> >>>>>>     Thanks,
> >>>>>>     JamO
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>>     On 02/18/2016 11:48 AM, Luis Gomez wrote:
> >>>>>>>     If the same test worked 6-8 months ago this seems like a bug, but 
> >>>>>>> please feel free to open it whenever you
> >>>>>>>     are sure.
> >>>>>>> 
> >>>>>>>>     On Feb 18, 2016, at 11:45 AM, Alexis de Talhou?t 
> >>>>>>>> <[email protected] <mailto:[email protected] 
> >>>>>>>> <mailto:[email protected]%20%3cmailto:[email protected]>>
> >>>>>>>>     <mailto:[email protected] 
> >>>>>>>> <mailto:[email protected]>> <mailto:[email protected] 
> >>>>>>>> <mailto:[email protected]>>> wrote:
> >>>>>>>> 
> >>>>>>>>     Hello Luis,
> >>>>>>>> 
> >>>>>>>>     For sure I?m willing to open a bug but before I want to make 
> >>>>>>>> sure there is a bug and that I?m not doing
> >>>>>>>>     something wrong.
> >>>>>>>>     In ODL?s infra, there is a test to find the maximum number of 
> >>>>>>>> switches that can be connected to ODL, and
> >>>>>>>>     this test
> >>>>>>>>     reach ~ 500 [0]
> >>>>>>>>     I was able to scale up to 1090 switches [1] using the CSIT job 
> >>>>>>>> in the sandbox.
> >>>>>>>>     I believe the CSIT test is different in a way that switches are 
> >>>>>>>> emulated in one mininet VM, whereas I?m
> >>>>>>>>     connecting OVS
> >>>>>>>>     instances from separate containers.
> >>>>>>>> 
> >>>>>>>>     6-8 months ago, I was able to perform the same test, and scale 
> >>>>>>>> with OVS docker container up to ~400 before
> >>>>>>>>     ODL start
> >>>>>>>>     crashing (with some optimization done behind the scene, i.e. 
> >>>>>>>> ulimit, mem, cpu, GC?)
> >>>>>>>>     Now I?m not able to scale more than 100 with the same 
> >>>>>>>> configuration.
> >>>>>>>> 
> >>>>>>>>     FYI: I just quickly look at the CSIT test [0] karaf.log, it 
> >>>>>>>> seems the test is actually failing but it is not
> >>>>>>>>     correctly
> >>>>>>>>     advertised? switch connection are dropped.
> >>>>>>>>     Look for those:
> >>>>>>>>     016-02-18 07:07:51,741 | WARN  | entLoopGroup-6-6 | 
> >>>>>>>> OFFrameDecoder                   | 181 -
> >>>>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
> >>>>>>>> 0.6.4.SNAPSHOT | Unexpected exception from downstream.
> >>>>>>>>     java.io.IOException: Connection reset by peer
> >>>>>>>>     at sun.nio.ch.FileDispatcherImpl.read0(Native Method)[:1.7.0_85]
> >>>>>>>>     at 
> >>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)[:1.7.0_85]
> >>>>>>>>     at 
> >>>>>>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)[:1.7.0_85]
> >>>>>>>>     at sun.nio.ch.IOUtil.read(IOUtil.java:192)[:1.7.0_85]
> >>>>>>>>     at 
> >>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)[:1.7.0_85]
> >>>>>>>>     at
> >>>>>>>>     
> >>>>>>>> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)[111:io.netty.buffer:4.0.26.Final]
> >>>>>>>>     at 
> >>>>>>>> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)[111:io.netty.buffer:4.0.26.Final]
> >>>>>>>>     at
> >>>>>>>>     
> >>>>>>>> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)[109:io.netty.transport:4.0.26.Final]
> >>>>>>>>     at
> >>>>>>>>     
> >>>>>>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)[109:io.netty.transport:4.0.26.Final]
> >>>>>>>>     at
> >>>>>>>>     
> >>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)[109:io.netty.transport:4.0.26.Final]
> >>>>>>>>     at
> >>>>>>>>     
> >>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)[109:io.netty.transport:4.0.26.Final]
> >>>>>>>>     at
> >>>>>>>>     
> >>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)[109:io.netty.transport:4.0.26.Final]
> >>>>>>>>     at 
> >>>>>>>> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:349)[109:io.netty.transport:4.0.26.Final]
> >>>>>>>>     at
> >>>>>>>>     
> >>>>>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)[110:io.netty.common:4.0.26.Final]
> >>>>>>>>     at
> >>>>>>>>     
> >>>>>>>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[110:io.netty.common:4.0.26.Final]
> >>>>>>>>     at java.lang.Thread.run(Thread.java:745)[:1.7.0_85]
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>>     [0]: 
> >>>>>>>> https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-scalability-daily-only-stable-lithium/
> >>>>>>>>  
> >>>>>>>> <https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-scalability-daily-only-stable-lithium/>
> >>>>>>>>     [1]: https://git.opendaylight.org/gerrit/#/c/33213/ 
> >>>>>>>> <https://git.opendaylight.org/gerrit/#/c/33213/>
> >>>>>>>> 
> >>>>>>>>>     On Feb 18, 2016, at 2:28 PM, Luis Gomez <[email protected] 
> >>>>>>>>> <mailto:[email protected]>
> >>>>>>>>>     <mailto:[email protected] <mailto:[email protected]>> 
> >>>>>>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >>>>>>>>> 
> >>>>>>>>>     Alexis, thanks very much for sharing this test. Would you mind 
> >>>>>>>>> to open a bug with all this info so we can
> >>>>>>>>>     track this?
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>>>     On Feb 18, 2016, at 7:29 AM, Alexis de Talhou?t 
> >>>>>>>>>> <[email protected] <mailto:[email protected]>
> >>>>>>>>>>     <mailto:[email protected] 
> >>>>>>>>>> <mailto:[email protected]>> <mailto:[email protected] 
> >>>>>>>>>> <mailto:[email protected]>>> wrote:
> >>>>>>>>>> 
> >>>>>>>>>>     Hi Michal,
> >>>>>>>>>> 
> >>>>>>>>>>     ODL memory is capped at 2go, the more memory I add, those more 
> >>>>>>>>>> OVS I can connect. Regarding CPU, it?s
> >>>>>>>>>>     around 10-20%
> >>>>>>>>>>     when connecting new OVS, with some peak to 80%.
> >>>>>>>>>> 
> >>>>>>>>>>     After some investigation, here is what I observed:
> >>>>>>>>>>     Let say I have 50 switches connected, stat manager disabled. I 
> >>>>>>>>>> have one opened socket per switch, plus an
> >>>>>>>>>>     additional
> >>>>>>>>>>     one for the controller.
> >>>>>>>>>>     Then I connect a new switch (2016-02-18 09:35:08,059), 51 
> >>>>>>>>>> switches? something is happening causing all
> >>>>>>>>>>     connection to
> >>>>>>>>>>     be dropped (by device?) and then ODL
> >>>>>>>>>>     try to recreate them and goes in a crazy loop where it is 
> >>>>>>>>>> never able to re-establish communication, but keeps
> >>>>>>>>>>     creating new sockets.
> >>>>>>>>>>     I?m suspecting something being garbage collected due to lack 
> >>>>>>>>>> of memory, although no OOM errors.
> >>>>>>>>>> 
> >>>>>>>>>>     Attached the YourKit Java Profiler analysis for the described 
> >>>>>>>>>> scenario and the logs [1].
> >>>>>>>>>> 
> >>>>>>>>>>     Thanks,
> >>>>>>>>>>     Alexis
> >>>>>>>>>> 
> >>>>>>>>>>     [1]: 
> >>>>>>>>>> https://www.dropbox.com/sh/dgqeqv4j76zwbh3/AACim0za1fUozc7DlY 
> >>>>>>>>>> <https://www.dropbox.com/sh/dgqeqv4j76zwbh3/AACim0za1fUozc7DlY>
> >>>>>>>>>> J4fsMJa?dl=0
> >>>>>>>>>> 
> >>>>>>>>>>>     On Feb 9, 2016, at 8:59 AM, Michal Rehak -X (mirehak - 
> >>>>>>>>>>> PANTHEON TECHNOLOGIES at Cisco) <[email protected] 
> >>>>>>>>>>> <mailto:[email protected]>
> >>>>>>>>>>>     <mailto:[email protected] <mailto:[email protected]>>
> >>>>>>>>>>>     <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >>>>>>>>>>> 
> >>>>>>>>>>>     Hi Alexis,
> >>>>>>>>>>>     I am not sure how OVS uses threads - in changelog there is 
> >>>>>>>>>>> some concurrency related improvement in 2.1.3
> >>>>>>>>>>>     and 2.3.
> >>>>>>>>>>>     Also I guess docker can be forced regarding assigned 
> >>>>>>>>>>> resources.
> >>>>>>>>>>> 
> >>>>>>>>>>>     For you the most important is the amount of cores used by 
> >>>>>>>>>>> controller.
> >>>>>>>>>>> 
> >>>>>>>>>>>     How does your cpu and memory consumption look like when you 
> >>>>>>>>>>> connect all the OVSs?
> >>>>>>>>>>> 
> >>>>>>>>>>>     Regards,
> >>>>>>>>>>>     Michal
> >>>>>>>>>>> 
> >>>>>>>>>>>     ________________________________________
> >>>>>>>>>>>     From: Alexis de Talhou?t <[email protected] 
> >>>>>>>>>>> <mailto:[email protected]>
> >>>>>>>>>>>     <mailto:[email protected] 
> >>>>>>>>>>> <mailto:[email protected]>> <mailto:[email protected] 
> >>>>>>>>>>> <mailto:[email protected]>>>
> >>>>>>>>>>>     Sent: Tuesday, February 9, 2016 14:44
> >>>>>>>>>>>     To: Michal Rehak -X (mirehak - PANTHEON TECHNOLOGIES at Cisco)
> >>>>>>>>>>>     Cc: [email protected] 
> >>>>>>>>>>> <mailto:[email protected]>
> >>>>>>>>>>>     <mailto:[email protected] 
> >>>>>>>>>>> <mailto:[email protected]>> 
> >>>>>>>>>>> <mailto:[email protected] 
> >>>>>>>>>>> <mailto:[email protected]>>
> >>>>>>>>>>>     Subject: Re: [openflowplugin-dev] Scalability issues
> >>>>>>>>>>> 
> >>>>>>>>>>>     Hello Michal,
> >>>>>>>>>>> 
> >>>>>>>>>>>     Yes, all the OvS instances I?m running has a unique DPID.
> >>>>>>>>>>> 
> >>>>>>>>>>>     Regarding the thread limit for netty, I?m running test in a 
> >>>>>>>>>>> server that has 28 CPU(s).
> >>>>>>>>>>> 
> >>>>>>>>>>>     Does each OvS instances is assigned its own thread?
> >>>>>>>>>>> 
> >>>>>>>>>>>     Thanks,
> >>>>>>>>>>>     Alexis
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>>>     On Feb 9, 2016, at 3:42 AM, Michal Rehak -X (mirehak - 
> >>>>>>>>>>>> PANTHEON TECHNOLOGIES at Cisco)
> >>>>>>>>>>>>     <[email protected] <mailto:[email protected] 
> >>>>>>>>>>>> <mailto:[email protected]%20%3cmailto:[email protected]>>
> >>>>>>>>>>>>     <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     Hi Alexis,
> >>>>>>>>>>>>     in Li-design there is the stats manager not in form of 
> >>>>>>>>>>>> standalone app but as part of core of ofPlugin.
> >>>>>>>>>>>>     You can
> >>>>>>>>>>>>     disable it via rpc.
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     Just a question regarding your ovs setup. Do you have all 
> >>>>>>>>>>>> DPIDs unique?
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     Also there is limit for netty in form of amount of used 
> >>>>>>>>>>>> threads. By default it uses 2 x
> >>>>>>>>>>>>     cpu_cores_amount. You
> >>>>>>>>>>>>     should have as many cores as possible in order to get max 
> >>>>>>>>>>>> performance.
> >>>>>>>>>>>> 
> >>>>>>>>>>>> 
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     Regards,
> >>>>>>>>>>>>     Michal
> >>>>>>>>>>>> 
> >>>>>>>>>>>> 
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     ________________________________________
> >>>>>>>>>>>>     From: [email protected] 
> >>>>>>>>>>>> <mailto:[email protected]>
> >>>>>>>>>>>>     <mailto:[email protected] 
> >>>>>>>>>>>> <mailto:[email protected]>> 
> >>>>>>>>>>>> <mailto:[email protected] 
> >>>>>>>>>>>> <mailto:[email protected]>>
> >>>>>>>>>>>>     <[email protected] 
> >>>>>>>>>>>> <mailto:[email protected]>
> >>>>>>>>>>>>     <mailto:[email protected] 
> >>>>>>>>>>>> <mailto:[email protected]>> 
> >>>>>>>>>>>> <mailto:[email protected] 
> >>>>>>>>>>>> <mailto:[email protected]>>>
> >>>>>>>>>>>>     on
> >>>>>>>>>>>>     behalf of Alexis de Talhou?t <[email protected] 
> >>>>>>>>>>>> <mailto:[email protected]>
> >>>>>>>>>>>>     <mailto:[email protected] 
> >>>>>>>>>>>> <mailto:[email protected]>> 
> >>>>>>>>>>>> <mailto:[email protected] 
> >>>>>>>>>>>> <mailto:[email protected]>>>
> >>>>>>>>>>>>     Sent: Tuesday, February 9, 2016 00:45
> >>>>>>>>>>>>     To: [email protected] 
> >>>>>>>>>>>> <mailto:[email protected]>
> >>>>>>>>>>>>     <mailto:[email protected] 
> >>>>>>>>>>>> <mailto:[email protected]>> 
> >>>>>>>>>>>> <mailto:[email protected] 
> >>>>>>>>>>>> <mailto:[email protected]>>
> >>>>>>>>>>>>     Subject: [openflowplugin-dev] Scalability issues
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     Hello openflowplugin-dev,
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     I?m currently running some scalability test against 
> >>>>>>>>>>>> openflowplugin-li plugin, stable/lithium.
> >>>>>>>>>>>>     Playing with CSIT job, I was able to connect up to 1090
> >>>>>>>>>>>>     switches: 
> >>>>>>>>>>>> https://git.opendaylight.org/gerrit/#/c/33213/ 
> >>>>>>>>>>>> <https://git.opendaylight.org/gerrit/#/c/33213/>
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     I?m now running the test against 40 OvS switches, each one 
> >>>>>>>>>>>> of them is in a docker container.
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     Connecting around 30 of them works fine, but then, adding a 
> >>>>>>>>>>>> new one break completely ODL, it goes crazy and
> >>>>>>>>>>>>     unresponsible.
> >>>>>>>>>>>>     Attach a snippet of the karaf.log with log set to DEBUG for 
> >>>>>>>>>>>> org.opendaylight.openflowplugin, thus it?s a
> >>>>>>>>>>>>     really
> >>>>>>>>>>>>     big log (~2.5MB).
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     Here it what I observed based on the log:
> >>>>>>>>>>>>     I have 30 switches connected, all works fine. Then I add a 
> >>>>>>>>>>>> new one:
> >>>>>>>>>>>>     - SalRoleServiceImpl starts doing its thing (2016-02-08 
> >>>>>>>>>>>> 23:13:38,534)
> >>>>>>>>>>>>     - RpcManagerImpl Registering Openflow RPCs (2016-02-08 
> >>>>>>>>>>>> 23:13:38,546)
> >>>>>>>>>>>>     - ConnectionAdapterImpl Hello received (2016-02-08 
> >>>>>>>>>>>> 23:13:40,520)
> >>>>>>>>>>>>     - Creation of the transaction chain, ?
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     Then all starts failing apart with this log:
> >>>>>>>>>>>>>     2016-02-08 23:13:50,021 | DEBUG | ntLoopGroup-11-9 | 
> >>>>>>>>>>>>> ConnectionContextImpl            | 190 -
> >>>>>>>>>>>>>     org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | 
> >>>>>>>>>>>>> disconnecting:
> >>>>>>>>>>>>>     node=/172.31.100.9:46736|auxId=0|connection state =
> >>>>>>>>>>>>> RIP
> >>>>>>>>>>>>     End then ConnectionContextImpl disconnects one by one the 
> >>>>>>>>>>>> switches, RpcManagerImpl is unregistered
> >>>>>>>>>>>>     Then it goes crazy for a while.
> >>>>>>>>>>>>     But all I?ve done is adding a new switch..
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     Finally, at 2016-02-08 23:14:26,666, exceptions are thrown:
> >>>>>>>>>>>>>     2016-02-08 23:14:26,666 | ERROR | lt-dispatcher-85 | 
> >>>>>>>>>>>>> LocalThreePhaseCommitCohort      | 172 -
> >>>>>>>>>>>>>     org.opendaylight.controller.sal-distributed-datastore - 
> >>>>>>>>>>>>> 1.2.4.SNAPSHOT | Failed to prepare transaction
> >>>>>>>>>>>>>     member-1-chn-5-txn-180 on backend
> >>>>>>>>>>>>>     akka.pattern.AskTimeoutException: Ask timed out on
> >>>>>>>>>>>>>     [ActorSelection[Anchor(akka://opendaylight-cluster-data/ 
> >>>>>>>>>>>>> <akka://opendaylight-cluster-data/>),
> >>>>>>>>>>>>>     
> >>>>>>>>>>>>> Path(/user/shardmanager-operational/member-1-shard-invento
> >>>>>>>>>>>>> ry-operational#-1518836725)]] after [30000 ms]
> >>>>>>>>>>>>     And it goes for a while.
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     Do you have any input on the same?
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     Could you give some advice to be able to scale? (I know
> >>>>>>>>>>>> disabling StatisticManager can help for instance)
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     Am I doing something wrong?
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     I can provide any asked information regarding the issue I?m 
> >>>>>>>>>>>> facing.
> >>>>>>>>>>>> 
> >>>>>>>>>>>>     Thanks,
> >>>>>>>>>>>>     Alexis
> >>>>>>>>>>>> 
> >>>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>>     _______________________________________________
> >>>>>>>>>>     openflowplugin-dev mailing list
> >>>>>>>>>>     [email protected] 
> >>>>>>>>>> <mailto:[email protected]>
> >>>>>>>>>>     <mailto:[email protected] 
> >>>>>>>>>> <mailto:[email protected]>> 
> >>>>>>>>>> <mailto:[email protected] 
> >>>>>>>>>> <mailto:[email protected]>>
> >>>>>>>>>>     
> >>>>>>>>>> https://lists.opendaylight.org/mailman/listinfo/openflowplugi 
> >>>>>>>>>> <https://lists.opendaylight.org/mailman/listinfo/openflowplugi>
> >>>>>>>>>> n-dev
> >>>>>>>>> 
> >>>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>>     _______________________________________________
> >>>>>>>     openflowplugin-dev mailing list
> >>>>>>>     [email protected] 
> >>>>>>> <mailto:[email protected]>
> >>>>>>>     <mailto:[email protected] 
> >>>>>>> <mailto:[email protected]>> 
> >>>>>>> <mailto:[email protected] 
> >>>>>>> <mailto:[email protected]>>
> >>>>>>>     
> >>>>>>> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-d 
> >>>>>>> <https://lists.opendaylight.org/mailman/listinfo/openflowplugin-d>
> >>>>>>> ev
> >>> 
> >> 
> >> 
> >>     _______________________________________________
> >>     openflowplugin-dev mailing list
> >>     [email protected] 
> >> <mailto:[email protected]> 
> >> <mailto:[email protected] 
> >> <mailto:[email protected]>>
> >>     
> >> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev 
> >> <https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev>
> >> 
> >> 
> >

_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Re: [openflowplugin-dev] Scalability issues

Reply via email to