Arlin Davis wrote:

Aniruddha Bohra wrote:

I am not sure, but arent uCM and uAT simply for connection establishment?

Yes, but they also set up many of the transfer attributes of the connected QP. The uCM/uAT version uses path_records from the SA query but the socket_CM version just builds them by hand similiar to the way ibv_rc_pingpong does. You would have to look at the pathrecord->pktlifetime to see the actual timeout value being used.

Ok, I added some debug and it looks like the path record returned from uAT looks suspect. Here are the results from tuAT and opensm running on my cluster. Path record pktlife is 0 (uCM adds 1) so the ACK timeout value for this connection will be very short.

path_comp_handler: ctxt 0x525fa0, req_id 90 rec_num 1
path_comp_handler: SRC GID subnet fe80000000000000 id 0002c9020000409d
path_comp_handler: DST GID subnet fe80000000000000 id 0002c90200004071
path_comp_handler: slid 5 dlid 2 mtu 120203(2) pktlife 0(0) <<< ??? path_comp_handler: hops 0 npaths 0 pkey ffff tclass 0 rate 0(0) <<< ???

Hal, can you take a look at uAT and see if the copy to user space is working correctly.

Aniruddha, can you apply the following patch and send us the output from your run?

-arlin

Signed-off by: Arlin Davis <[EMAIL PROTECTED]>

Index: dapl/openib/dapl_ib_cm.c
===================================================================
--- dapl/openib/dapl_ib_cm.c    (revision 3951)
+++ dapl/openib/dapl_ib_cm.c    (working copy)
@@ -136,14 +136,27 @@

       dapl_dbg_log(DAPL_DBG_TYPE_CM,
               " path_comp_handler: SRC GID subnet %016llx id %016llx\n",
- (unsigned long long)cpu_to_be64(conn->dapl_rt.sgid.global.subnet_prefix), - (unsigned long long)cpu_to_be64(conn->dapl_rt.sgid.global.interface_id) ); + (unsigned long long)cpu_to_be64(conn->dapl_path.sgid.global.subnet_prefix), + (unsigned long long)cpu_to_be64(conn->dapl_path.sgid.global.interface_id) );

       dapl_dbg_log(DAPL_DBG_TYPE_CM,
               " path_comp_handler: DST GID subnet %016llx id %016llx\n",
- (unsigned long long)cpu_to_be64(conn->dapl_rt.dgid.global.subnet_prefix), - (unsigned long long)cpu_to_be64(conn->dapl_rt.dgid.global.interface_id) ); + (unsigned long long)cpu_to_be64(conn->dapl_path.dgid.global.subnet_prefix), + (unsigned long long)cpu_to_be64(conn->dapl_path.dgid.global.interface_id) );

+       dapl_dbg_log(DAPL_DBG_TYPE_CM,
+ " path_comp_handler: slid %x dlid %x mtu %x(%x) pktlife %x(%x)\n",
+               ntohs(conn->dapl_path.slid), ntohs(conn->dapl_path.dlid),
+               conn->dapl_path.mtu, conn->dapl_path.mtu_selector,
+               conn->dapl_path.packet_life_time,
+               conn->dapl_path.packet_life_time_selector );
+
+       dapl_dbg_log(DAPL_DBG_TYPE_CM,
+ " path_comp_handler: hops %x npaths %x pkey %x tclass %x rate %x(%x)\n",
+               conn->dapl_path.hop_limit, conn->dapl_path.numb_path,
+               conn->dapl_path.pkey, conn->dapl_path.traffic_class,
+               conn->dapl_path.rate, conn->dapl_path.rate_selector);
+
       if (rec_num <= 0) {
               dapl_dbg_log(DAPL_DBG_TYPE_CM,
                            " path_comp_handler: ERR %d retry %d\n",



_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to