Hello, I have udapl over Gen2 setup on our cluster and am able to run udapl programs. However, sometimes I get this error (after a few runs of the same program):
open_hca: ERR ib_at_ips_by_gid for mthca0 dapls_ib_open_hca failed 40000 The machine is a AMD Opteron (Tyan S2895), with Mellanox MemFree cards (fw ver 5.1.0). lsmod on my machine shows this: [EMAIL PROTECTED]:~] lsmod | grep ^ib ib_ipoib 48008 0 ib_uat 14840 0 ib_at 25696 1 ib_uat ib_sa 17804 2 ib_ipoib,ib_at ib_ucm 22280 0 ib_cm 37744 1 ib_ucm ib_uverbs 35992 0 ib_umad 18208 0 ib_mthca 122656 0 ib_mad 44072 4 ib_sa,ib_cm,ib_umad,ib_mthca ib_core 56192 8 ib_ipoib,ib_sa,ib_ucm,ib_cm,ib_uverbs,ib_umad,ib_mthca,ib_mad My infiniband devices are (created by hand): [EMAIL PROTECTED]:~] ls -l /dev/infiniband/ total 0 crw-rw-rw- 1 root root 231, 191 2005-10-20 21:13 uat crw-rw-rw- 1 root root 231, 224 2005-10-20 21:12 ucm0 crwxrwxrwx 1 root root 231, 192 2005-09-21 04:37 umad0 crwxrwxrwx 1 root root 231, 192 2005-09-16 19:29 uverbs0 crwxrwxrwx 1 root root 231, 192 2005-09-16 19:29 uverbs1 I'd really appreciate if someone could help me understand what might be going wrong. Thanks, Sayantan. -- http://www.cse.ohio-state.edu/~surs _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
