Re: [openflowplugin-dev] Scalability issues

Jamo Luhrsen Fri, 04 Mar 2016 16:57:07 -0800

Alexis,

thanks for the bug and the patch, and keep up the good work digging at
openflowplugin.


JamO

On 03/04/2016 07:38 AM, Alexis de Talhouët wrote:
> JamO,
> 
> Here is the bug: https://bugs.opendaylight.org/show_bug.cgi?id=5464
> Here is the patch in int/test: https://git.opendaylight.org/gerrit/#/c/35813/
> It is still WIP. And yes I believe we should have a CSIT job running the test.
> 
> Thanks,
> Alexis
>> On Mar 3, 2016, at 12:41 AM, Jamo Luhrsen <[email protected] 
>> <mailto:[email protected]>> wrote:
>>
>>
>>
>> On 02/19/2016 02:10 PM, Alexis de Talhouët wrote:
>>> So far my results are:
>>>
>>> OVS 2.4.0: ODL configure with 2G of mem —> max is ~50 switches connected
>>> OVS 2.3.1: ODL configure with 256MG of mem —> I currently have 150 switches 
>>> connected, can’t scale more due to infra
>>> limits.
>>
>> Alexis, I think this is probably worth putting a bugzilla up.
>>
>> How much horsepower do you need per docker ovs instance?  We need to get this
>> automated in CSIT.  Marcus from ovsdb wants to do similar tests with ovsdb.
>>
>> JamO
>>
>>
>>> I will pursue me testing next week.
>>>
>>> Thanks,
>>> Alexis
>>>
>>>> On Feb 19, 2016, at 5:06 PM, Abhijit Kumbhare <[email protected] 
>>>> <mailto:[email protected]>
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>> Interesting. I wonder - why that would be?
>>>>
>>>> On Fri, Feb 19, 2016 at 1:19 PM, Alexis de Talhouët 
>>>> <[email protected] <mailto:[email protected]>
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>>    OVS 2.3.x scales fine
>>>>    OVS 2.4.x doesn’t scale well.
>>>>
>>>>    Here is also the docker file for ovs 2.4.1
>>>>
>>>>
>>>>
>>>>>    On Feb 19, 2016, at 11:20 AM, Alexis de Talhouët 
>>>>> <[email protected] <mailto:[email protected]>
>>>>> <mailto:[email protected]>> wrote:
>>>>>
>>>>>>    can I use your containers?  do you have any scripts/tools to bring 
>>>>>> things up/down?
>>>>>
>>>>>    Sure, attached a tar file containing all scripts / config / dockerfile 
>>>>> I’m using to setup docker containers
>>>>>    emulating OvS.
>>>>>    FYI: it’s ovs 2.3.0 and not 2.4.0 anymore
>>>>>
>>>>>    Also, forget about this whole mail thread, something in my private 
>>>>> container must be breaking OVS behaviour, I
>>>>>    don’t know what yet.
>>>>>
>>>>>    With the docker file attached here, I can scale 90+ without any 
>>>>> trouble...
>>>>>
>>>>>    Thanks,
>>>>>    Alexis
>>>>>
>>>>>    <ovs_scalability_setup.tar.gz>
>>>>>
>>>>>>    On Feb 18, 2016, at 6:07 PM, Jamo Luhrsen <[email protected] 
>>>>>> <mailto:[email protected]>
>>>>>> <mailto:[email protected]>> wrote:
>>>>>>
>>>>>>    inline...
>>>>>>
>>>>>>    On 02/18/2016 02:58 PM, Alexis de Talhouët wrote:
>>>>>>>    I’m running OVS 2.4, against stable/lithium, openflowplugin-li
>>>>>>
>>>>>>
>>>>>>    so this is one difference between CSIT and your setup, in addition to 
>>>>>> the whole
>>>>>>    containers vs mininet.
>>>>>>
>>>>>>>    I never scaled up to 1k, this was in the CSIT job.
>>>>>>>    In a real scenario, I scaled to ~400. But it was even before 
>>>>>>> clustering came into play in ofp lithium.
>>>>>>>
>>>>>>>    I think the log I sent have log trace for openflowplugin and 
>>>>>>> openflowjava, it not the case I could resubmit the
>>>>>>>    logs.
>>>>>>>    I removed some of them in openflowjava because it was way to chatty 
>>>>>>> (logging all messages content between ovs
>>>>>>>    <---> odl)
>>>>>>>
>>>>>>>    Unfortunately those IOException happen after the whole thing blow 
>>>>>>> up. I was able to narrow done some logs in
>>>>>>>    openflowjava
>>>>>>>    to see the first disconnected event. As mentioned in a previous mail 
>>>>>>> (in this mail thread) it’s the device that is
>>>>>>>    issuing the disconnect:
>>>>>>>
>>>>>>>>    2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | OFFrameDecoder 
>>>>>>>>                   | 201 -
>>>>>>>>    org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>>>> 0.6.4.SNAPSHOT | skipping bytebuf - too few bytes for
>>>>>>>>    header: 0 < 8
>>>>>>>>    2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
>>>>>>>> OFVersionDetector                | 201 -
>>>>>>>>    org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>>>> 0.6.4.SNAPSHOT | not enough data
>>>>>>>>    2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
>>>>>>>> DelegatingInboundHandler         | 201 -
>>>>>>>>    org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>>>> 0.6.4.SNAPSHOT | Channel inactive
>>>>>>>>    2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
>>>>>>>> ConnectionAdapterImpl            | 201 -
>>>>>>>>    org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>>>> 0.6.4.SNAPSHOT | ConsumeIntern msg on [id: 0x1efab5fb,
>>>>>>>>    /172.18.0.49:36983 <http://172.18.0.49:36983/> :> 
>>>>>>>> /192.168.1.159:6633 <http://192.168.1.159:6633/>]
>>>>>>>>    2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
>>>>>>>> ConnectionAdapterImpl            | 201 -
>>>>>>>>    org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>>>> 0.6.4.SNAPSHOT | ConsumeIntern msg - DisconnectEvent
>>>>>>>>    2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
>>>>>>>> ConnectionContextImpl            | 205 -
>>>>>>>>    org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | 
>>>>>>>> disconnecting: node=/172.18.0.49:36983|auxId=0|connection
>>>>>>>>    state = RIP
>>>>>>>
>>>>>>>    Those logs come from another run, so are not in the logs I sent 
>>>>>>> earlier. Although the behaviour is always the
>>>>>>> same.
>>>>>>>
>>>>>>>    Regarding the memory, I don’t want to add more than 2G memory, 
>>>>>>> because, and I tested it, the more memory I add,
>>>>>>>    the more
>>>>>>>    I can scale. But as you pointed out,
>>>>>>>    this issue is not OOM error. Thus I rather like failing at 2G (less 
>>>>>>> docker containers to spawn each run ~50).
>>>>>>
>>>>>>    so, maybe reduce your memory then to simplify the reproducing steps.  
>>>>>> Since you know that increasing
>>>>>>    memory allows you to scale further, but still hit the problem; let's 
>>>>>> make it easier to hit.  how far
>>>>>>    can you go with the max mem set to 500M?  if you are only loading 
>>>>>> ofp-li.
>>>>>>
>>>>>>>    I definitely need some help here, because I can’t sort myself out in 
>>>>>>> the openflowplugin + openflowjava codebase…
>>>>>>>    But I believe I already have Michal’s attention :)
>>>>>>
>>>>>>    can I use your containers?  do you have any scripts/tools to bring 
>>>>>> things up/down?
>>>>>>    I might be able to try and reproduce myself.  I like breaking things 
>>>>>> :)
>>>>>>
>>>>>>    JamO
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>    Thanks,
>>>>>>>    Alexis
>>>>>>>
>>>>>>>
>>>>>>>>    On Feb 18, 2016, at 5:44 PM, Jamo Luhrsen <[email protected] 
>>>>>>>> <mailto:[email protected]>
>>>>>>>>    <mailto:[email protected]> <mailto:[email protected]>> wrote:
>>>>>>>>
>>>>>>>>    Alexis,  don't worry about filing a bug just to give us a common 
>>>>>>>> place to work/comment, even
>>>>>>>>    if we close it later because of something outside of ODL.  Email is 
>>>>>>>> fine too.
>>>>>>>>
>>>>>>>>    what ovs version do you have in your containers?  this test sounds 
>>>>>>>> great.
>>>>>>>>
>>>>>>>>    Luis is right, that if you were scaling well past 1k in the past, 
>>>>>>>> but now it falls over at
>>>>>>>>    50 it sounds like a bug.
>>>>>>>>
>>>>>>>>    Oh, you can try increasing the jvm max_mem from default of 2G just 
>>>>>>>> as a data point.  The
>>>>>>>>    fact that you don't get OOMs makes me think memory might not be the 
>>>>>>>> final bottleneck.
>>>>>>>>
>>>>>>>>    you could enable debug/trace logs in the right modules (need ofp 
>>>>>>>> devs to tell us that)
>>>>>>>>    for a little more info.
>>>>>>>>
>>>>>>>>    I've seen those IOExceptions before and always assumed it was from 
>>>>>>>> an OF switch doing a
>>>>>>>>    hard RST on it's connection.
>>>>>>>>
>>>>>>>>
>>>>>>>>    Thanks,
>>>>>>>>    JamO
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    On 02/18/2016 11:48 AM, Luis Gomez wrote:
>>>>>>>>>    If the same test worked 6-8 months ago this seems like a bug, but 
>>>>>>>>> please feel free to open it whenever you
>>>>>>>>>    are sure.
>>>>>>>>>
>>>>>>>>>>    On Feb 18, 2016, at 11:45 AM, Alexis de Talhouët 
>>>>>>>>>> <[email protected] <mailto:[email protected]>
>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>    <mailto:[email protected]> 
>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>
>>>>>>>>>>    Hello Luis,
>>>>>>>>>>
>>>>>>>>>>    For sure I’m willing to open a bug but before I want to make sure 
>>>>>>>>>> there is a bug and that I’m not doing
>>>>>>>>>>    something wrong.
>>>>>>>>>>    In ODL’s infra, there is a test to find the maximum number of 
>>>>>>>>>> switches that can be connected to ODL, and
>>>>>>>>>>    this test
>>>>>>>>>>    reach ~ 500 [0]
>>>>>>>>>>    I was able to scale up to 1090 switches [1] using the CSIT job in 
>>>>>>>>>> the sandbox.
>>>>>>>>>>    I believe the CSIT test is different in a way that switches are 
>>>>>>>>>> emulated in one mininet VM, whereas I’m
>>>>>>>>>>    connecting OVS
>>>>>>>>>>    instances from separate containers.
>>>>>>>>>>
>>>>>>>>>>    6-8 months ago, I was able to perform the same test, and scale 
>>>>>>>>>> with OVS docker container up to ~400 before
>>>>>>>>>>    ODL start
>>>>>>>>>>    crashing (with some optimization done behind the scene, i.e. 
>>>>>>>>>> ulimit, mem, cpu, GC…)
>>>>>>>>>>    Now I’m not able to scale more than 100 with the same 
>>>>>>>>>> configuration.
>>>>>>>>>>
>>>>>>>>>>    FYI: I just quickly look at the CSIT test [0] karaf.log, it seems 
>>>>>>>>>> the test is actually failing but it is not
>>>>>>>>>>    correctly
>>>>>>>>>>    advertised… switch connection are dropped.
>>>>>>>>>>    Look for those:
>>>>>>>>>>    016-02-18 07:07:51,741 | WARN  | entLoopGroup-6-6 | 
>>>>>>>>>> OFFrameDecoder                   | 181 -
>>>>>>>>>>    org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>>>>>> 0.6.4.SNAPSHOT | Unexpected exception from downstream.
>>>>>>>>>>    java.io.IOException: Connection reset by peer
>>>>>>>>>>    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)[:1.7.0_85]
>>>>>>>>>>    at 
>>>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)[:1.7.0_85]
>>>>>>>>>>    at 
>>>>>>>>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)[:1.7.0_85]
>>>>>>>>>>    at sun.nio.ch.IOUtil.read(IOUtil.java:192)[:1.7.0_85]
>>>>>>>>>>    at 
>>>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)[:1.7.0_85]
>>>>>>>>>>    at
>>>>>>>>>>    
>>>>>>>>>> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)[111:io.netty.buffer:4.0.26.Final]
>>>>>>>>>>    at 
>>>>>>>>>> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)[111:io.netty.buffer:4.0.26.Final]
>>>>>>>>>>    at
>>>>>>>>>>    
>>>>>>>>>> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>>>    at
>>>>>>>>>>    
>>>>>>>>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>>>    at
>>>>>>>>>>    
>>>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>>>    at
>>>>>>>>>>    
>>>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>>>    at
>>>>>>>>>>    
>>>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>>>    at 
>>>>>>>>>> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:349)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>>>    at
>>>>>>>>>>    
>>>>>>>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)[110:io.netty.common:4.0.26.Final]
>>>>>>>>>>    at
>>>>>>>>>>    
>>>>>>>>>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[110:io.netty.common:4.0.26.Final]
>>>>>>>>>>    at java.lang.Thread.run(Thread.java:745)[:1.7.0_85]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    [0]:
>>>>>>>>>> https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-scalability-daily-only-stable-lithium/
>>>>>>>>>>    [1]: https://git.opendaylight.org/gerrit/#/c/33213/
>>>>>>>>>>
>>>>>>>>>>>    On Feb 18, 2016, at 2:28 PM, Luis Gomez <[email protected] 
>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>    <mailto:[email protected]> <mailto:[email protected]>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>    Alexis, thanks very much for sharing this test. Would you mind 
>>>>>>>>>>> to open a bug with all this info so we can
>>>>>>>>>>>    track this?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>    On Feb 18, 2016, at 7:29 AM, Alexis de Talhouët 
>>>>>>>>>>>> <[email protected] <mailto:[email protected]>
>>>>>>>>>>>>    <mailto:[email protected]> 
>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>    Hi Michal,
>>>>>>>>>>>>
>>>>>>>>>>>>    ODL memory is capped at 2go, the more memory I add, those more 
>>>>>>>>>>>> OVS I can connect. Regarding CPU, it’s
>>>>>>>>>>>>    around 10-20%
>>>>>>>>>>>>    when connecting new OVS, with some peak to 80%.
>>>>>>>>>>>>
>>>>>>>>>>>>    After some investigation, here is what I observed:
>>>>>>>>>>>>    Let say I have 50 switches connected, stat manager disabled. I 
>>>>>>>>>>>> have one opened socket per switch, plus an
>>>>>>>>>>>>    additional
>>>>>>>>>>>>    one for the controller.
>>>>>>>>>>>>    Then I connect a new switch (2016-02-18 09:35:08,059), 51 
>>>>>>>>>>>> switches… something is happening causing all
>>>>>>>>>>>>    connection to
>>>>>>>>>>>>    be dropped (by device?) and then ODL
>>>>>>>>>>>>    try to recreate them and goes in a crazy loop where it is never 
>>>>>>>>>>>> able to re-establish communication, but keeps
>>>>>>>>>>>>    creating new sockets.
>>>>>>>>>>>>    I’m suspecting something being garbage collected due to lack of 
>>>>>>>>>>>> memory, although no OOM errors.
>>>>>>>>>>>>
>>>>>>>>>>>>    Attached the YourKit Java Profiler analysis for the described 
>>>>>>>>>>>> scenario and the logs [1].
>>>>>>>>>>>>
>>>>>>>>>>>>    Thanks,
>>>>>>>>>>>>    Alexis
>>>>>>>>>>>>
>>>>>>>>>>>>    [1]: 
>>>>>>>>>>>> https://www.dropbox.com/sh/dgqeqv4j76zwbh3/AACim0za1fUozc7DlYJ4fsMJa?dl=0
>>>>>>>>>>>>
>>>>>>>>>>>>>    On Feb 9, 2016, at 8:59 AM, Michal Rehak -X (mirehak - 
>>>>>>>>>>>>> PANTHEON TECHNOLOGIES at Cisco)
>>>>>>>>>>>>> <[email protected] <mailto:[email protected]>
>>>>>>>>>>>>>    <mailto:[email protected]>
>>>>>>>>>>>>>    <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Hi Alexis,
>>>>>>>>>>>>>    I am not sure how OVS uses threads - in changelog there is 
>>>>>>>>>>>>> some concurrency related improvement in 2.1.3
>>>>>>>>>>>>>    and 2.3.
>>>>>>>>>>>>>    Also I guess docker can be forced regarding assigned resources.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    For you the most important is the amount of cores used by 
>>>>>>>>>>>>> controller.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    How does your cpu and memory consumption look like when you 
>>>>>>>>>>>>> connect all the OVSs?
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Regards,
>>>>>>>>>>>>>    Michal
>>>>>>>>>>>>>
>>>>>>>>>>>>>    ________________________________________
>>>>>>>>>>>>>    From: Alexis de Talhouët <[email protected] 
>>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>>    <mailto:[email protected]> 
>>>>>>>>>>>>> <mailto:[email protected]>>
>>>>>>>>>>>>>    Sent: Tuesday, February 9, 2016 14:44
>>>>>>>>>>>>>    To: Michal Rehak -X (mirehak - PANTHEON TECHNOLOGIES at Cisco)
>>>>>>>>>>>>>    Cc: [email protected] 
>>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>>    <mailto:[email protected]> 
>>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>>    Subject: Re: [openflowplugin-dev] Scalability issues
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Hello Michal,
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Yes, all the OvS instances I’m running has a unique DPID.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Regarding the thread limit for netty, I’m running test in a 
>>>>>>>>>>>>> server that has 28 CPU(s).
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Does each OvS instances is assigned its own thread?
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Thanks,
>>>>>>>>>>>>>    Alexis
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>    On Feb 9, 2016, at 3:42 AM, Michal Rehak -X (mirehak - 
>>>>>>>>>>>>>> PANTHEON TECHNOLOGIES at Cisco)
>>>>>>>>>>>>>>    <[email protected] <mailto:[email protected]> 
>>>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>>>    <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Hi Alexis,
>>>>>>>>>>>>>>    in Li-design there is the stats manager not in form of 
>>>>>>>>>>>>>> standalone app but as part of core of ofPlugin.
>>>>>>>>>>>>>>    You can
>>>>>>>>>>>>>>    disable it via rpc.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Just a question regarding your ovs setup. Do you have all 
>>>>>>>>>>>>>> DPIDs unique?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Also there is limit for netty in form of amount of used 
>>>>>>>>>>>>>> threads. By default it uses 2 x
>>>>>>>>>>>>>>    cpu_cores_amount. You
>>>>>>>>>>>>>>    should have as many cores as possible in order to get max 
>>>>>>>>>>>>>> performance.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Regards,
>>>>>>>>>>>>>>    Michal
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    ________________________________________
>>>>>>>>>>>>>>    From: [email protected]
>>>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>>>    <mailto:[email protected]>
>>>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>>>    <[email protected]
>>>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>>>    <mailto:[email protected]>
>>>>>>>>>>>>>> <mailto:[email protected]>>
>>>>>>>>>>>>>>    on
>>>>>>>>>>>>>>    behalf of Alexis de Talhouët <[email protected] 
>>>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>>>    <mailto:[email protected]> 
>>>>>>>>>>>>>> <mailto:[email protected]>>
>>>>>>>>>>>>>>    Sent: Tuesday, February 9, 2016 00:45
>>>>>>>>>>>>>>    To: [email protected] 
>>>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>>>    <mailto:[email protected]> 
>>>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>>>    Subject: [openflowplugin-dev] Scalability issues
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Hello openflowplugin-dev,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    I’m currently running some scalability test against 
>>>>>>>>>>>>>> openflowplugin-li plugin, stable/lithium.
>>>>>>>>>>>>>>    Playing with CSIT job, I was able to connect up to 1090
>>>>>>>>>>>>>>    switches: https://git.opendaylight.org/gerrit/#/c/33213/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    I’m now running the test against 40 OvS switches, each one of 
>>>>>>>>>>>>>> them is in a docker container.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Connecting around 30 of them works fine, but then, adding a 
>>>>>>>>>>>>>> new one break completely ODL, it goes crazy and
>>>>>>>>>>>>>>    unresponsible.
>>>>>>>>>>>>>>    Attach a snippet of the karaf.log with log set to DEBUG for 
>>>>>>>>>>>>>> org.opendaylight.openflowplugin, thus it’s a
>>>>>>>>>>>>>>    really
>>>>>>>>>>>>>>    big log (~2.5MB).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Here it what I observed based on the log:
>>>>>>>>>>>>>>    I have 30 switches connected, all works fine. Then I add a 
>>>>>>>>>>>>>> new one:
>>>>>>>>>>>>>>    - SalRoleServiceImpl starts doing its thing (2016-02-08 
>>>>>>>>>>>>>> 23:13:38,534)
>>>>>>>>>>>>>>    - RpcManagerImpl Registering Openflow RPCs (2016-02-08 
>>>>>>>>>>>>>> 23:13:38,546)
>>>>>>>>>>>>>>    - ConnectionAdapterImpl Hello received (2016-02-08 
>>>>>>>>>>>>>> 23:13:40,520)
>>>>>>>>>>>>>>    - Creation of the transaction chain, …
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Then all starts failing apart with this log:
>>>>>>>>>>>>>>>    2016-02-08 23:13:50,021 | DEBUG | ntLoopGroup-11-9 | 
>>>>>>>>>>>>>>> ConnectionContextImpl            | 190 -
>>>>>>>>>>>>>>>    org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | 
>>>>>>>>>>>>>>> disconnecting:
>>>>>>>>>>>>>>>    node=/172.31.100.9:46736|auxId=0|connection state = RIP
>>>>>>>>>>>>>>    End then ConnectionContextImpl disconnects one by one the 
>>>>>>>>>>>>>> switches, RpcManagerImpl is unregistered
>>>>>>>>>>>>>>    Then it goes crazy for a while.
>>>>>>>>>>>>>>    But all I’ve done is adding a new switch..
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Finally, at 2016-02-08 23:14:26,666, exceptions are thrown:
>>>>>>>>>>>>>>>    2016-02-08 23:14:26,666 | ERROR | lt-dispatcher-85 | 
>>>>>>>>>>>>>>> LocalThreePhaseCommitCohort      | 172 -
>>>>>>>>>>>>>>>    org.opendaylight.controller.sal-distributed-datastore - 
>>>>>>>>>>>>>>> 1.2.4.SNAPSHOT | Failed to prepare transaction
>>>>>>>>>>>>>>>    member-1-chn-5-txn-180 on backend
>>>>>>>>>>>>>>>    akka.pattern.AskTimeoutException: Ask timed out on
>>>>>>>>>>>>>>>    [ActorSelection[Anchor(akka://opendaylight-cluster-data/),
>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>> Path(/user/shardmanager-operational/member-1-shard-inventory-operational#-1518836725)]]
>>>>>>>>>>>>>>>  after [30000 ms]
>>>>>>>>>>>>>>    And it goes for a while.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Do you have any input on the same?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Could you give some advice to be able to scale? (I know 
>>>>>>>>>>>>>> disabling StatisticManager can help for instance)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Am I doing something wrong?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    I can provide any asked information regarding the issue I’m 
>>>>>>>>>>>>>> facing.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Thanks,
>>>>>>>>>>>>>>    Alexis
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    _______________________________________________
>>>>>>>>>>>>    openflowplugin-dev mailing list
>>>>>>>>>>>>    [email protected] 
>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>    <mailto:[email protected]> 
>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>    
>>>>>>>>>>>> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    _______________________________________________
>>>>>>>>>    openflowplugin-dev mailing list
>>>>>>>>>    [email protected] 
>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>    <mailto:[email protected]> 
>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>    https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
>>>>>
>>>>
>>>>
>>>>    _______________________________________________
>>>>    openflowplugin-dev mailing list
>>>>    [email protected] 
>>>> <mailto:[email protected]>
>>>> <mailto:[email protected]>
>>>>    https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
>>>>
>>>>
>>>
> 
_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Re: [openflowplugin-dev] Scalability issues

Reply via email to