Hi Alexis, in Li-design there is the stats manager not in form of standalone app but as part of core of ofPlugin. You can disable it via rpc.
Just a question regarding your ovs setup. Do you have all DPIDs unique? Also there is limit for netty in form of amount of used threads. By default it uses 2 x cpu_cores_amount. You should have as many cores as possible in order to get max performance. Regards, Michal ________________________________________ From: [email protected] <[email protected]> on behalf of Alexis de Talhouët <[email protected]> Sent: Tuesday, February 9, 2016 00:45 To: [email protected] Subject: [openflowplugin-dev] Scalability issues Hello openflowplugin-dev, I’m currently running some scalability test against openflowplugin-li plugin, stable/lithium. Playing with CSIT job, I was able to connect up to 1090 switches: https://git.opendaylight.org/gerrit/#/c/33213/ I’m now running the test against 40 OvS switches, each one of them is in a docker container. Connecting around 30 of them works fine, but then, adding a new one break completely ODL, it goes crazy and unresponsible. Attach a snippet of the karaf.log with log set to DEBUG for org.opendaylight.openflowplugin, thus it’s a really big log (~2.5MB). Here it what I observed based on the log: I have 30 switches connected, all works fine. Then I add a new one: - SalRoleServiceImpl starts doing its thing (2016-02-08 23:13:38,534) - RpcManagerImpl Registering Openflow RPCs (2016-02-08 23:13:38,546) - ConnectionAdapterImpl Hello received (2016-02-08 23:13:40,520) - Creation of the transaction chain, … Then all starts failing apart with this log: > 2016-02-08 23:13:50,021 | DEBUG | ntLoopGroup-11-9 | ConnectionContextImpl > | 190 - org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | > disconnecting: node=/172.31.100.9:46736|auxId=0|connection state = RIP End then ConnectionContextImpl disconnects one by one the switches, RpcManagerImpl is unregistered Then it goes crazy for a while. But all I’ve done is adding a new switch.. Finally, at 2016-02-08 23:14:26,666, exceptions are thrown: > 2016-02-08 23:14:26,666 | ERROR | lt-dispatcher-85 | > LocalThreePhaseCommitCohort | 172 - > org.opendaylight.controller.sal-distributed-datastore - 1.2.4.SNAPSHOT | > Failed to prepare transaction member-1-chn-5-txn-180 on backend > akka.pattern.AskTimeoutException: Ask timed out on > [ActorSelection[Anchor(akka://opendaylight-cluster-data/), > Path(/user/shardmanager-operational/member-1-shard-inventory-operational#-1518836725)]] > after [30000 ms] And it goes for a while. Do you have any input on the same? Could you give some advice to be able to scale? (I know disabling StatisticManager can help for instance) Am I doing something wrong? I can provide any asked information regarding the issue I’m facing. Thanks, Alexis _______________________________________________ openflowplugin-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev
