Hi, Thank you very much for your reply. Now the open HCA problem comes back :-(
Here is the log message: [EMAIL PROTECTED] mpiexec -n 2 ./a.out DAPL: NOT Setting Loopback dapl_ib_init: ib_thread_init(12016) dapl_ia_open (ib0, 8, 0x7ffffff28668, 0xd9da48) open_hca: mthca0 - 0xdb3390 ib_thread(12016,0x40200960): ENTER: pipe 8 at 4 open_hca: Found dev mthca0 0002c902004002e8 open_hca: GID subnet fe80000000000000 id 0002c902004002e9 ips_by_gid: RET 0 at_rec 0x7ffffff283d0 -> id 2861 dapli_at_event_cb() ip_comp_handler: rec 0x7ffffff283d0 ->id 2861 id 2861 num -22 3afa6000 ip_comp_handler: resolution err -22 retry 1 ip_comp_handler: ips_by_gid 0 rec 0x7ffffff283d0->id 2862 dapli_at_event_cb() ip_comp_handler: rec 0x7ffffff283d0 ->id 2862 id 2862 num -22 0 ip_comp_handler: resolution err -22 retry 2 ip_comp_handler: ips_by_gid 0 rec 0x7ffffff283d0->id 2863 dapli_at_event_cb() ip_comp_handler: rec 0x7ffffff283d0 ->id 2863 id 2863 num -22 0 ip_comp_handler: resolution err -22 retry 3 ip_comp_handler: ips_by_gid 0 rec 0x7ffffff283d0->id 2864 dapli_at_event_cb() ip_comp_handler: rec 0x7ffffff283d0 ->id 2864 id 2864 num -22 0 ip_comp_handler: resolution err -22 retry 4 ip_comp_handler: ERR: at_rec 0x7ffffff283d0, id 2864 num -22 open_hca: ERR ib_at_ips_by_gid for mthca0 dapls_ib_open_hca failed 40000 dapl_ia_open () returns 0x40000 DAPL: Stopped (dapl_fini) dapl_ib_release: ib_thread_destroy(12016) ib_thread_destroy: waiting for ib_thread ib_thread(12016) EXIT [rdma_udapl_priv.c:640] error(262144): Cannot open IA DAPL: NOT Setting Loopback dapl_ib_init: ib_thread_init(11337) dapl_ia_open (ib0, 8, 0x7fffffa8d618, 0xd9da48) open_hca: mthca0 - 0xdb3390 ib_thread(11337,0x40800960): ENTER: pipe 8 at 4 open_hca: Found dev mthca0 0002c90200400314 open_hca: GID subnet fe80000000000000 id 0002c90200400315 ips_by_gid: RET 0 at_rec 0x7fffffa8d380 -> id 4627 dapli_at_event_cb() ip_comp_handler: rec 0x7fffffa8d380 ->id 4627 id 4627 num -22 3c66c000 ip_comp_handler: resolution err -22 retry 1 ip_comp_handler: ips_by_gid 0 rec 0x7fffffa8d380->id 4628 dapli_at_event_cb() ip_comp_handler: rec 0x7fffffa8d380 ->id 4628 id 4628 num -22 0 ip_comp_handler: resolution err -22 retry 2 [rdma_udapl_priv.c:640] error(262144): Cannot open IA ip_comp_handler: ips_by_gid 0 rec 0x7fffffa8d380->id 4629 dapli_at_event_cb() ip_comp_handler: rec 0x7fffffa8d380 ->id 4629 id 4629 num -22 0 ip_comp_handler: resolution err -22 retry 3 ip_comp_handler: ips_by_gid 0 rec 0x7fffffa8d380->id 4630 dapli_at_event_cb() ip_comp_handler: rec 0x7fffffa8d380 ->id 4630 id 4630 num -22 0 ip_comp_handler: resolution err -22 retry 4 ip_comp_handler: ERR: at_rec 0x7fffffa8d380, id 4630 num -22 open_hca: ERR ib_at_ips_by_gid for mthca0 dapls_ib_open_hca failed 40000 dapl_ia_open () returns 0x40000 DAPL: Stopped (dapl_fini) dapl_ib_release: ib_thread_destroy(11337) ib_thread_destroy: waiting for ib_thread ib_thread(11337) EXIT ib_thread_destroy(12016) exit rank 0 in job 421 ro0_33361 caused collective abort of all ranks exit status of rank 0: return code 1 Any idea what is going on? Thanks. Lei ----- Original Message ----- From: Roland Dreier <[EMAIL PROTECTED]> Date: Friday, October 21, 2005 7:48 pm Subject: Re: [openib-general] uDAPL open HCA problem > LEI> Hi, I'm from the same lab as Sayantan. Thanks for your > LEI> suggestion. Currently we could not reproduce the problem, > LEI> however, we meet another problem. When I try to tear > down a > LEI> connection between two nodes I often get some messages like > LEI> this: > > LEI> [ 0] 005e0406 [ 4] 00000000 [ 8] 00000000 [ c] 00000000 > LEI> [10] 05f90000 [14] 00000000 [18] 00000008 [1c] fe100000 > > That's OK, it's just showing that you polled a "work request flushed" > status from a completion queue. The latest version of libmthca should > no longer print these messages. > > - R. > _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
