Re: [gpfsug-discuss] VERBS RDMA issue

2017-05-21 Thread Tushar Pathare
Thanks Sven. Will read more about it and discuss with the team to come to a conclusion Thank you for pointing out the param. Will let you know the results after the tuning. Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing Bioinformatics Division

Re: [gpfsug-discuss] VERBS RDMA issue

2017-05-21 Thread Sven Oehme
The reason is the default setting of : verbsRdmasPerConnection: 16 you can increase this , on smaller clusters i run on some with 1024, but its not advised to run this on 100's of nodes and not if you know exactly what you are doing. i would start by doubling it to 32 and see how much of

Re: [gpfsug-discuss] VERBS RDMA issue

2017-05-21 Thread Tushar Pathare
Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And VERBS RDMA rdma write error IBV_WC_REM_ACCESS_ERR to 112.11.11.11 ( sidra.snode_group2.gpfs) on mlx5_0 port 2

Re: [gpfsug-discuss] VERBS RDMA issue

2017-05-21 Thread Tushar Pathare
Hello Aaron, Yes we saw recently an issue with VERBS RDMA rdma send error IBV_WC_RETRY_EXC_ERR to 111.11.11.11 (sidra.nnode_group2.gpfs) on mlx5_0 port 2 fabnum 0 vendor_err 129 And Tushar B Pathare MBA IT,BE IT Bigdata & GPFS Software Development & Databases Scientific Computing

Re: [gpfsug-discuss] VERBS RDMA issue

2017-05-21 Thread Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
Hi Tushar, For me the issue was an underlying performance bottleneck (some CPU frequency scaling problems causing cores to throttle back when it wasn't appropriate). I noticed you have verbsRdmaSend set to yes. I've seen suggestions in the past to turn this off under certain conditions

[gpfsug-discuss] VERBS RDMA issue

2017-05-21 Thread Tushar Pathare
Hello Team, We are facing a lot of messages waiters related to waiting for conn rdmas < conn maxrdmas Is there some