Re: [openflowplugin-dev] Scalability issues

Jamo Luhrsen Fri, 19 Feb 2016 14:27:11 -0800

I'm not sure it would have anything to do with it, but this is an
issue I keep bumping in to with ovs2.4:


https://bugs.opendaylight.org/show_bug.cgi?id=5173

our CSIT is still on version 2.0, but other than that I wonder what
the big difference is that CSIT scales towards 500 whereas you are
hitting a wall around 50.

I haven't had chance to try your containers yet, Alexis, but hope to
soon.

JamO

On 02/19/2016 02:19 PM, Alexis de Talhouët wrote:
> "What I see as a result is that 2.3 scales better,
> and keeps substantially more consistent results as the number of clients
> increases, whereas 2.4 gets consistently worse as the number of clients
> increases.  I.e., 2.4 does not scale particularly well.  The test ends
> after 3 consecutive non-improving increments.”
> 
> from here: http://www.openldap.org/lists/openldap-devel/200908/msg00023.html
> 
> Nice explanation of the why
> 
>> On Feb 19, 2016, at 5:10 PM, Alexis de Talhouët <[email protected] 
>> <mailto:[email protected]>> wrote:
>>
>> So far my results are:
>>
>> OVS 2.4.0: ODL configure with 2G of mem —> max is ~50 switches connected
>> OVS 2.3.1: ODL configure with 256MG of mem —> I currently have 150 switches 
>> connected, can’t scale more due to infra
>> limits.
>>
>> I will pursue me testing next week.
>>
>> Thanks,
>> Alexis
>>
>>> On Feb 19, 2016, at 5:06 PM, Abhijit Kumbhare <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>
>>> Interesting. I wonder - why that would be?
>>>
>>> On Fri, Feb 19, 2016 at 1:19 PM, Alexis de Talhouët 
>>> <[email protected] <mailto:[email protected]>> wrote:
>>>
>>>     OVS 2.3.x scales fine
>>>     OVS 2.4.x doesn’t scale well.
>>>
>>>     Here is also the docker file for ovs 2.4.1
>>>
>>>
>>>
>>>>     On Feb 19, 2016, at 11:20 AM, Alexis de Talhouët 
>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>
>>>>>     can I use your containers?  do you have any scripts/tools to bring 
>>>>> things up/down?
>>>>
>>>>     Sure, attached a tar file containing all scripts / config / dockerfile 
>>>> I’m using to setup docker containers
>>>>     emulating OvS.
>>>>     FYI: it’s ovs 2.3.0 and not 2.4.0 anymore
>>>>
>>>>     Also, forget about this whole mail thread, something in my private 
>>>> container must be breaking OVS behaviour, I
>>>>     don’t know what yet.
>>>>
>>>>     With the docker file attached here, I can scale 90+ without any 
>>>> trouble...
>>>>
>>>>     Thanks,
>>>>     Alexis
>>>>
>>>>     <ovs_scalability_setup.tar.gz>
>>>>
>>>>>     On Feb 18, 2016, at 6:07 PM, Jamo Luhrsen <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>>
>>>>>     inline...
>>>>>
>>>>>     On 02/18/2016 02:58 PM, Alexis de Talhouët wrote:
>>>>>>     I’m running OVS 2.4, against stable/lithium, openflowplugin-li
>>>>>
>>>>>
>>>>>     so this is one difference between CSIT and your setup, in addition to 
>>>>> the whole
>>>>>     containers vs mininet.
>>>>>
>>>>>>     I never scaled up to 1k, this was in the CSIT job.
>>>>>>     In a real scenario, I scaled to ~400. But it was even before 
>>>>>> clustering came into play in ofp lithium.
>>>>>>
>>>>>>     I think the log I sent have log trace for openflowplugin and 
>>>>>> openflowjava, it not the case I could resubmit
>>>>>>     the logs.
>>>>>>     I removed some of them in openflowjava because it was way to chatty 
>>>>>> (logging all messages content between ovs
>>>>>>     <---> odl)
>>>>>>
>>>>>>     Unfortunately those IOException happen after the whole thing blow 
>>>>>> up. I was able to narrow done some logs in
>>>>>>     openflowjava
>>>>>>     to see the first disconnected event. As mentioned in a previous mail 
>>>>>> (in this mail thread) it’s the device that is
>>>>>>     issuing the disconnect:
>>>>>>
>>>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | OFFrameDecoder 
>>>>>>>                   | 201 -
>>>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>>> 0.6.4.SNAPSHOT | skipping bytebuf - too few bytes for
>>>>>>>     header: 0 < 8
>>>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
>>>>>>> OFVersionDetector                | 201 -
>>>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>>> 0.6.4.SNAPSHOT | not enough data
>>>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
>>>>>>> DelegatingInboundHandler         | 201 -
>>>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>>> 0.6.4.SNAPSHOT | Channel inactive
>>>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
>>>>>>> ConnectionAdapterImpl            | 201 -
>>>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>>> 0.6.4.SNAPSHOT | ConsumeIntern msg on [id: 0x1efab5fb,
>>>>>>>     /172.18.0.49:36983 <http://172.18.0.49:36983/> :> 
>>>>>>> /192.168.1.159:6633 <http://192.168.1.159:6633/>]
>>>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
>>>>>>> ConnectionAdapterImpl            | 201 -
>>>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>>> 0.6.4.SNAPSHOT | ConsumeIntern msg - DisconnectEvent
>>>>>>>     2016-02-18 16:56:30,018 | DEBUG | entLoopGroup-6-3 | 
>>>>>>> ConnectionContextImpl            | 205 -
>>>>>>>     org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | 
>>>>>>> disconnecting: node=/172.18.0.49:36983|auxId=0|connection
>>>>>>>     state = RIP
>>>>>>
>>>>>>     Those logs come from another run, so are not in the logs I sent 
>>>>>> earlier. Although the behaviour is always the
>>>>>>     same.
>>>>>>
>>>>>>     Regarding the memory, I don’t want to add more than 2G memory, 
>>>>>> because, and I tested it, the more memory I
>>>>>>     add, the more
>>>>>>     I can scale. But as you pointed out, 
>>>>>>     this issue is not OOM error. Thus I rather like failing at 2G (less 
>>>>>> docker containers to spawn each run ~50).
>>>>>
>>>>>     so, maybe reduce your memory then to simplify the reproducing steps.  
>>>>> Since you know that increasing
>>>>>     memory allows you to scale further, but still hit the problem; let's 
>>>>> make it easier to hit.  how far
>>>>>     can you go with the max mem set to 500M?  if you are only loading 
>>>>> ofp-li.
>>>>>
>>>>>>     I definitely need some help here, because I can’t sort myself out in 
>>>>>> the openflowplugin + openflowjava codebase…
>>>>>>     But I believe I already have Michal’s attention :)
>>>>>
>>>>>     can I use your containers?  do you have any scripts/tools to bring 
>>>>> things up/down?
>>>>>     I might be able to try and reproduce myself.  I like breaking things 
>>>>> :)
>>>>>
>>>>>     JamO
>>>>>
>>>>>
>>>>>>
>>>>>>     Thanks,
>>>>>>     Alexis
>>>>>>
>>>>>>
>>>>>>>     On Feb 18, 2016, at 5:44 PM, Jamo Luhrsen <[email protected]
>>>>>>>     <mailto:[email protected]> <mailto:[email protected]>> wrote:
>>>>>>>
>>>>>>>     Alexis,  don't worry about filing a bug just to give us a common 
>>>>>>> place to work/comment, even
>>>>>>>     if we close it later because of something outside of ODL.  Email is 
>>>>>>> fine too.
>>>>>>>
>>>>>>>     what ovs version do you have in your containers?  this test sounds 
>>>>>>> great.
>>>>>>>
>>>>>>>     Luis is right, that if you were scaling well past 1k in the past, 
>>>>>>> but now it falls over at
>>>>>>>     50 it sounds like a bug.
>>>>>>>
>>>>>>>     Oh, you can try increasing the jvm max_mem from default of 2G just 
>>>>>>> as a data point.  The
>>>>>>>     fact that you don't get OOMs makes me think memory might not be the 
>>>>>>> final bottleneck.
>>>>>>>
>>>>>>>     you could enable debug/trace logs in the right modules (need ofp 
>>>>>>> devs to tell us that)
>>>>>>>     for a little more info.
>>>>>>>
>>>>>>>     I've seen those IOExceptions before and always assumed it was from 
>>>>>>> an OF switch doing a
>>>>>>>     hard RST on it's connection.
>>>>>>>
>>>>>>>
>>>>>>>     Thanks,
>>>>>>>     JamO
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     On 02/18/2016 11:48 AM, Luis Gomez wrote:
>>>>>>>>     If the same test worked 6-8 months ago this seems like a bug, but 
>>>>>>>> please feel free to open it whenever you
>>>>>>>>     are sure.
>>>>>>>>
>>>>>>>>>     On Feb 18, 2016, at 11:45 AM, Alexis de Talhouët 
>>>>>>>>> <[email protected] <mailto:[email protected]>
>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>
>>>>>>>>>     Hello Luis,
>>>>>>>>>
>>>>>>>>>     For sure I’m willing to open a bug but before I want to make sure 
>>>>>>>>> there is a bug and that I’m not doing
>>>>>>>>>     something wrong.
>>>>>>>>>     In ODL’s infra, there is a test to find the maximum number of 
>>>>>>>>> switches that can be connected to ODL, and
>>>>>>>>>     this test
>>>>>>>>>     reach ~ 500 [0]
>>>>>>>>>     I was able to scale up to 1090 switches [1] using the CSIT job in 
>>>>>>>>> the sandbox. 
>>>>>>>>>     I believe the CSIT test is different in a way that switches are 
>>>>>>>>> emulated in one mininet VM, whereas I’m
>>>>>>>>>     connecting OVS
>>>>>>>>>     instances from separate containers.
>>>>>>>>>
>>>>>>>>>     6-8 months ago, I was able to perform the same test, and scale 
>>>>>>>>> with OVS docker container up to ~400 before
>>>>>>>>>     ODL start
>>>>>>>>>     crashing (with some optimization done behind the scene, i.e. 
>>>>>>>>> ulimit, mem, cpu, GC…)
>>>>>>>>>     Now I’m not able to scale more than 100 with the same 
>>>>>>>>> configuration.
>>>>>>>>>
>>>>>>>>>     FYI: I just quickly look at the CSIT test [0] karaf.log, it seems 
>>>>>>>>> the test is actually failing but it is
>>>>>>>>>     not correctly
>>>>>>>>>     advertised… switch connection are dropped.
>>>>>>>>>     Look for those:
>>>>>>>>>     016-02-18 07:07:51,741 | WARN  | entLoopGroup-6-6 | 
>>>>>>>>> OFFrameDecoder                   | 181 -
>>>>>>>>>     org.opendaylight.openflowjava.openflow-protocol-impl - 
>>>>>>>>> 0.6.4.SNAPSHOT | Unexpected exception from downstream.
>>>>>>>>>     java.io.IOException: Connection reset by peer
>>>>>>>>>     at sun.nio.ch.FileDispatcherImpl.read0(Native Method)[:1.7.0_85]
>>>>>>>>>     at 
>>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)[:1.7.0_85]
>>>>>>>>>     at 
>>>>>>>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)[:1.7.0_85]
>>>>>>>>>     at sun.nio.ch.IOUtil.read(IOUtil.java:192)[:1.7.0_85]
>>>>>>>>>     at 
>>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:384)[:1.7.0_85]
>>>>>>>>>     at
>>>>>>>>>     
>>>>>>>>> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)[111:io.netty.buffer:4.0.26.Final]
>>>>>>>>>     at 
>>>>>>>>> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)[111:io.netty.buffer:4.0.26.Final]
>>>>>>>>>     at
>>>>>>>>>     
>>>>>>>>> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>>     at
>>>>>>>>>     
>>>>>>>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>>     at
>>>>>>>>>     
>>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>>     at
>>>>>>>>>     
>>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>>     at
>>>>>>>>>     
>>>>>>>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>>     at 
>>>>>>>>> io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:349)[109:io.netty.transport:4.0.26.Final]
>>>>>>>>>     at
>>>>>>>>>     
>>>>>>>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)[110:io.netty.common:4.0.26.Final]
>>>>>>>>>     at
>>>>>>>>>     
>>>>>>>>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[110:io.netty.common:4.0.26.Final]
>>>>>>>>>     at java.lang.Thread.run(Thread.java:745)[:1.7.0_85]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     [0]: 
>>>>>>>>> https://jenkins.opendaylight.org/releng/view/openflowplugin/job/openflowplugin-csit-1node-periodic-scalability-daily-only-stable-lithium/
>>>>>>>>>     [1]: https://git.opendaylight.org/gerrit/#/c/33213/
>>>>>>>>>
>>>>>>>>>>     On Feb 18, 2016, at 2:28 PM, Luis Gomez <[email protected]
>>>>>>>>>>     <mailto:[email protected]> <mailto:[email protected]>> wrote:
>>>>>>>>>>
>>>>>>>>>>     Alexis, thanks very much for sharing this test. Would you mind 
>>>>>>>>>> to open a bug with all this info so we can
>>>>>>>>>>     track this?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>     On Feb 18, 2016, at 7:29 AM, Alexis de Talhouët 
>>>>>>>>>>> <[email protected]
>>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>     Hi Michal,
>>>>>>>>>>>
>>>>>>>>>>>     ODL memory is capped at 2go, the more memory I add, those more 
>>>>>>>>>>> OVS I can connect. Regarding CPU, it’s
>>>>>>>>>>>     around 10-20%
>>>>>>>>>>>     when connecting new OVS, with some peak to 80%.
>>>>>>>>>>>
>>>>>>>>>>>     After some investigation, here is what I observed:
>>>>>>>>>>>     Let say I have 50 switches connected, stat manager disabled. I 
>>>>>>>>>>> have one opened socket per switch, plus an
>>>>>>>>>>>     additional
>>>>>>>>>>>     one for the controller.
>>>>>>>>>>>     Then I connect a new switch (2016-02-18 09:35:08,059), 51 
>>>>>>>>>>> switches… something is happening causing all
>>>>>>>>>>>     connection to
>>>>>>>>>>>     be dropped (by device?) and then ODL
>>>>>>>>>>>     try to recreate them and goes in a crazy loop where it is never 
>>>>>>>>>>> able to re-establish communication, but keeps
>>>>>>>>>>>     creating new sockets.
>>>>>>>>>>>     I’m suspecting something being garbage collected due to lack of 
>>>>>>>>>>> memory, although no OOM errors.
>>>>>>>>>>>
>>>>>>>>>>>     Attached the YourKit Java Profiler analysis for the described 
>>>>>>>>>>> scenario and the logs [1].
>>>>>>>>>>>
>>>>>>>>>>>     Thanks,
>>>>>>>>>>>     Alexis
>>>>>>>>>>>
>>>>>>>>>>>     [1]: 
>>>>>>>>>>> https://www.dropbox.com/sh/dgqeqv4j76zwbh3/AACim0za1fUozc7DlYJ4fsMJa?dl=0
>>>>>>>>>>>
>>>>>>>>>>>>     On Feb 9, 2016, at 8:59 AM, Michal Rehak -X (mirehak - 
>>>>>>>>>>>> PANTHEON TECHNOLOGIES at Cisco)
>>>>>>>>>>>>     <[email protected] <mailto:[email protected]>
>>>>>>>>>>>>     <mailto:[email protected]>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>     Hi Alexis,
>>>>>>>>>>>>     I am not sure how OVS uses threads - in changelog there is 
>>>>>>>>>>>> some concurrency related improvement in 2.1.3
>>>>>>>>>>>>     and 2.3.
>>>>>>>>>>>>     Also I guess docker can be forced regarding assigned resources.
>>>>>>>>>>>>
>>>>>>>>>>>>     For you the most important is the amount of cores used by 
>>>>>>>>>>>> controller.
>>>>>>>>>>>>
>>>>>>>>>>>>     How does your cpu and memory consumption look like when you 
>>>>>>>>>>>> connect all the OVSs?
>>>>>>>>>>>>
>>>>>>>>>>>>     Regards,
>>>>>>>>>>>>     Michal
>>>>>>>>>>>>
>>>>>>>>>>>>     ________________________________________
>>>>>>>>>>>>     From: Alexis de Talhouët <[email protected]
>>>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>>>> <mailto:[email protected]>>
>>>>>>>>>>>>     Sent: Tuesday, February 9, 2016 14:44
>>>>>>>>>>>>     To: Michal Rehak -X (mirehak - PANTHEON TECHNOLOGIES at Cisco)
>>>>>>>>>>>>     Cc: [email protected]
>>>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>     Subject: Re: [openflowplugin-dev] Scalability issues
>>>>>>>>>>>>
>>>>>>>>>>>>     Hello Michal,
>>>>>>>>>>>>
>>>>>>>>>>>>     Yes, all the OvS instances I’m running has a unique DPID.
>>>>>>>>>>>>
>>>>>>>>>>>>     Regarding the thread limit for netty, I’m running test in a 
>>>>>>>>>>>> server that has 28 CPU(s).
>>>>>>>>>>>>
>>>>>>>>>>>>     Does each OvS instances is assigned its own thread?
>>>>>>>>>>>>
>>>>>>>>>>>>     Thanks,
>>>>>>>>>>>>     Alexis
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>     On Feb 9, 2016, at 3:42 AM, Michal Rehak -X (mirehak - 
>>>>>>>>>>>>> PANTHEON TECHNOLOGIES at Cisco)
>>>>>>>>>>>>>     <[email protected] <mailto:[email protected]>
>>>>>>>>>>>>>     <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Hi Alexis,
>>>>>>>>>>>>>     in Li-design there is the stats manager not in form of 
>>>>>>>>>>>>> standalone app but as part of core of ofPlugin.
>>>>>>>>>>>>>     You can
>>>>>>>>>>>>>     disable it via rpc.
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Just a question regarding your ovs setup. Do you have all 
>>>>>>>>>>>>> DPIDs unique?
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Also there is limit for netty in form of amount of used 
>>>>>>>>>>>>> threads. By default it uses 2 x
>>>>>>>>>>>>>     cpu_cores_amount. You
>>>>>>>>>>>>>     should have as many cores as possible in order to get max 
>>>>>>>>>>>>> performance.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Regards,
>>>>>>>>>>>>>     Michal
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>     ________________________________________
>>>>>>>>>>>>>     From: [email protected]
>>>>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>>     <[email protected]
>>>>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>>>>> <mailto:[email protected]>>
>>>>>>>>>>>>>     on
>>>>>>>>>>>>>     behalf of Alexis de Talhouët <[email protected]
>>>>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>>>>> <mailto:[email protected]>>
>>>>>>>>>>>>>     Sent: Tuesday, February 9, 2016 00:45
>>>>>>>>>>>>>     To: [email protected]
>>>>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>>     Subject: [openflowplugin-dev] Scalability issues
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Hello openflowplugin-dev,
>>>>>>>>>>>>>
>>>>>>>>>>>>>     I’m currently running some scalability test against 
>>>>>>>>>>>>> openflowplugin-li plugin, stable/lithium.
>>>>>>>>>>>>>     Playing with CSIT job, I was able to connect up to 1090
>>>>>>>>>>>>>     switches: https://git.opendaylight.org/gerrit/#/c/33213/
>>>>>>>>>>>>>
>>>>>>>>>>>>>     I’m now running the test against 40 OvS switches, each one of 
>>>>>>>>>>>>> them is in a docker container.
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Connecting around 30 of them works fine, but then, adding a 
>>>>>>>>>>>>> new one break completely ODL, it goes crazy and
>>>>>>>>>>>>>     unresponsible.
>>>>>>>>>>>>>     Attach a snippet of the karaf.log with log set to DEBUG for 
>>>>>>>>>>>>> org.opendaylight.openflowplugin, thus it’s
>>>>>>>>>>>>>     a really
>>>>>>>>>>>>>     big log (~2.5MB).
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Here it what I observed based on the log:
>>>>>>>>>>>>>     I have 30 switches connected, all works fine. Then I add a 
>>>>>>>>>>>>> new one:
>>>>>>>>>>>>>     - SalRoleServiceImpl starts doing its thing (2016-02-08 
>>>>>>>>>>>>> 23:13:38,534)
>>>>>>>>>>>>>     - RpcManagerImpl Registering Openflow RPCs (2016-02-08 
>>>>>>>>>>>>> 23:13:38,546)
>>>>>>>>>>>>>     - ConnectionAdapterImpl Hello received (2016-02-08 
>>>>>>>>>>>>> 23:13:40,520)
>>>>>>>>>>>>>     - Creation of the transaction chain, …
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Then all starts failing apart with this log:
>>>>>>>>>>>>>>     2016-02-08 23:13:50,021 | DEBUG | ntLoopGroup-11-9 | 
>>>>>>>>>>>>>> ConnectionContextImpl            | 190 -
>>>>>>>>>>>>>>     org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | 
>>>>>>>>>>>>>> disconnecting:
>>>>>>>>>>>>>>     node=/172.31.100.9:46736|auxId=0|connection state = RIP
>>>>>>>>>>>>>     End then ConnectionContextImpl disconnects one by one the 
>>>>>>>>>>>>> switches, RpcManagerImpl is unregistered
>>>>>>>>>>>>>     Then it goes crazy for a while.
>>>>>>>>>>>>>     But all I’ve done is adding a new switch..
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Finally, at 2016-02-08 23:14:26,666, exceptions are thrown:
>>>>>>>>>>>>>>     2016-02-08 23:14:26,666 | ERROR | lt-dispatcher-85 | 
>>>>>>>>>>>>>> LocalThreePhaseCommitCohort      | 172 -
>>>>>>>>>>>>>>     org.opendaylight.controller.sal-distributed-datastore - 
>>>>>>>>>>>>>> 1.2.4.SNAPSHOT | Failed to prepare transaction
>>>>>>>>>>>>>>     member-1-chn-5-txn-180 on backend
>>>>>>>>>>>>>>     akka.pattern.AskTimeoutException: Ask timed out on
>>>>>>>>>>>>>>     [ActorSelection[Anchor(akka://opendaylight-cluster-data/),
>>>>>>>>>>>>>>     
>>>>>>>>>>>>>> Path(/user/shardmanager-operational/member-1-shard-inventory-operational#-1518836725)]]
>>>>>>>>>>>>>>  after [30000 ms]
>>>>>>>>>>>>>     And it goes for a while.
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Do you have any input on the same?
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Could you give some advice to be able to scale? (I know 
>>>>>>>>>>>>> disabling StatisticManager can help for instance)
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Am I doing something wrong?
>>>>>>>>>>>>>
>>>>>>>>>>>>>     I can provide any asked information regarding the issue I’m 
>>>>>>>>>>>>> facing.
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Thanks,
>>>>>>>>>>>>>     Alexis
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     _______________________________________________
>>>>>>>>>>>     openflowplugin-dev mailing list
>>>>>>>>>>>     [email protected]
>>>>>>>>>>>     <mailto:[email protected]> 
>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>     
>>>>>>>>>>> https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>     _______________________________________________
>>>>>>>>     openflowplugin-dev mailing list
>>>>>>>>     [email protected]
>>>>>>>>     <mailto:[email protected]> 
>>>>>>>> <mailto:[email protected]>
>>>>>>>>     https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
>>>>
>>>
>>>
>>>     _______________________________________________
>>>     openflowplugin-dev mailing list
>>>     [email protected] 
>>> <mailto:[email protected]>
>>>     https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
>>>
>>>
>>
> 
_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Re: [openflowplugin-dev] Scalability issues

Reply via email to