Re: [openflowplugin-dev] Scalability issues

Michal Rehak -X (mirehak - PANTHEON TECHNOLOGIES at Cisco) Tue, 09 Feb 2016 00:43:36 -0800

Hi Alexis,
in Li-design there is the stats manager not in form of standalone app but as 
part of core of ofPlugin. You can disable it via rpc.


Just a question regarding your ovs setup. Do you have all DPIDs unique? 

Also there is limit for netty in form of amount of used threads. By default it 
uses 2 x cpu_cores_amount. You should have as many cores as possible in order 
to get max performance.



Regards,
Michal



________________________________________
From: [email protected] 
<[email protected]> on behalf of Alexis de 
Talhouët <[email protected]>
Sent: Tuesday, February 9, 2016 00:45
To: [email protected]
Subject: [openflowplugin-dev] Scalability issues

Hello openflowplugin-dev,

I’m currently running some scalability test against openflowplugin-li plugin, 
stable/lithium.
Playing with CSIT job, I was able to connect up to 1090 switches: 
https://git.opendaylight.org/gerrit/#/c/33213/

I’m now running the test against 40 OvS switches, each one of them is in a 
docker container.

Connecting around 30 of them works fine, but then, adding a new one break 
completely ODL, it goes crazy and unresponsible.
Attach a snippet of the karaf.log with log set to DEBUG for 
org.opendaylight.openflowplugin, thus it’s a really big log (~2.5MB).

Here it what I observed based on the log:
I have 30 switches connected, all works fine. Then I add a new one:
 - SalRoleServiceImpl starts doing its thing (2016-02-08 23:13:38,534)
 - RpcManagerImpl Registering Openflow RPCs (2016-02-08 23:13:38,546)
 - ConnectionAdapterImpl Hello received (2016-02-08 23:13:40,520)
 - Creation of the transaction chain, …

Then all starts failing apart with this log:
> 2016-02-08 23:13:50,021 | DEBUG | ntLoopGroup-11-9 | ConnectionContextImpl    
>         | 190 - org.opendaylight.openflowplugin.impl - 0.1.4.SNAPSHOT | 
> disconnecting: node=/172.31.100.9:46736|auxId=0|connection state = RIP
End then ConnectionContextImpl disconnects one by one the switches, 
RpcManagerImpl is unregistered
Then it goes crazy for a while.
But all I’ve done is adding a new switch..

Finally, at 2016-02-08 23:14:26,666, exceptions are thrown:
> 2016-02-08 23:14:26,666 | ERROR | lt-dispatcher-85 | 
> LocalThreePhaseCommitCohort      | 172 - 
> org.opendaylight.controller.sal-distributed-datastore - 1.2.4.SNAPSHOT | 
> Failed to prepare transaction member-1-chn-5-txn-180 on backend
> akka.pattern.AskTimeoutException: Ask timed out on 
> [ActorSelection[Anchor(akka://opendaylight-cluster-data/), 
> Path(/user/shardmanager-operational/member-1-shard-inventory-operational#-1518836725)]]
>  after [30000 ms]
And it goes for a while.

Do you have any input on the same?

Could you give some advice to be able to scale? (I know disabling 
StatisticManager can help for instance)

Am I doing something wrong?

I can provide any asked information regarding the issue I’m facing.

Thanks,
Alexis


_______________________________________________
openflowplugin-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/openflowplugin-dev

Re: [openflowplugin-dev] Scalability issues

Reply via email to