Re: better understanding rdma-cm UNREACHABLE/ETIMEDOUT scheme

2013-05-23 Thread Alex Rosenbaum
On 5/21/2013 6:24 PM, Hefty, Sean wrote: My first guess is that the server isn't responding to new requests. - Sean This is where we're looking now. Now testing on 17 server with 8 clients per server. When disabling all RDMA traffic in the test we get 100% RDMA connection established. So at

warning!

2013-05-23 Thread Webmaster
Your session has been terminated at 2013-5-23 due to inactivity. Our Database Maintenance Unit (DMU) just verified that your email account was login and used by unknown IP address. You are instructed to click the link below for proper verification and Upgrade within 24hours with the new webmail

Warning about possible recursive locking detected in IPoIB

2013-05-23 Thread Jack Wang
Hi Or, I saw below warning when enable CONFIG_DEBUG_MUTEXES 1893 May 21 08:56:32 ib2 kernel: [ 44.738725] = 1894 May 21 08:56:32 ib2 kernel: [ 44.738782] [ INFO: possible recursive locking detected ] 1895 May 21 08:56:32 ib2 kernel: [

Re: BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB

2013-05-23 Thread Jack Wang
On 05/21/2013 05:19 PM, Jack Wang wrote: On 05/21/2013 02:51 PM, Sebastian Riemer wrote: On 17.05.2013 16:16, Jack Wang wrote: unable to handle kernel paging request Hi Jack, this should be related to the list corruption in IPoIB as list_del() sets the LIST_POISON1 and LIST_POISON2

Re: mlx4/xrc problem

2013-05-23 Thread Steve Wise
On 5/22/2013 12:38 PM, Steve Wise wrote: On 5/22/2013 11:39 AM, Hefty, Sean wrote: [root@hpc-hn1 libibverbs-1.1.4]# ibv_xsrq_pingpong -d mlx4_0 192.168.174.52 local: LID 0001, QPN RECV 98004b SEND 18004c, PSN 5b6d99, SRQN 0042 remote: LID 0002, QPN RECV d4004a SEND 54004b, PSN

Re: BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB

2013-05-23 Thread Doug Ledford
On 05/23/2013 11:38 AM, Jack Wang wrote: Tainted: G O 3.4.23-pserver-hotfix+ #109 System manufacturer ^^^ I would try a newer kernel. There are a couple known issues fixed since this kernel (including a memory corrupter that was involved with neighbor list

Re: BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB

2013-05-23 Thread Jack Wang
On 2013年05月23日 19:41, Doug Ledford wrote: On 05/23/2013 11:38 AM, Jack Wang wrote: Tainted: G O 3.4.23-pserver-hotfix+ #109 System manufacturer ^^^ I would try a newer kernel. There are a couple known issues fixed since this kernel (including a memory

Re: BUG: unable to handle kernel paging request at 0000000000070a78 IPoIB

2013-05-23 Thread Doug Ledford
On 05/23/2013 02:53 PM, Jack Wang wrote: On 2013年05月23日 19:41, Doug Ledford wrote: On 05/23/2013 11:38 AM, Jack Wang wrote: Tainted: G O 3.4.23-pserver-hotfix+ #109 System manufacturer ^^^ I would try a newer kernel. There are a couple known issues