Hi Hal, I've reviewed the code and as you noted - there is a bug in the code, actually 2 bugs. The qp0_mads_outstanding counter is incremented on mads with response expected. In the "regular" flow this counter is decremented through __osm_sm_mad_ctrl_retire_trans_mad, after the response was received and the mad was handled. The qp0_mads_outstanding is used for the signalling of "NO_PENDING_TRANSACTIONS", so I do not think this counter should be incremented when the mads are not ones with response expected. The 2 bugs are, thus: 1. In the _osm_vl15_poller in case of osm_vendor_send failure. This code is similar to the one in __osm_sm_mad_ctrl_retire_trans_mad, but should decrement and handle the qp0_mads_outstanding only if response_expected == TRUE. 2. In the shutdown process. Again, only mads with response_expected == TRUE should decrement the counter. The reason we haven't seen errors due to this is because the main and usual flow is fine. I will issue a patch for fixing this soon. Thanks, Yael
-----Original Message----- From: Eitan Zahavi Sent: Saturday, December 31, 2005 11:43 AM To: 'Hal Rosenstock'; Yael Kalka Cc: openib-general@openib.org Subject: RE: Some opensm/osm_vl15intf.c questions Hi Hal, As Yael was working on the ref-counting issues (a month or two ago) I will let her answer. It is very possible we are missing some. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:[EMAIL PROTECTED] > Sent: Friday, December 30, 2005 6:03 PM > To: Eitan Zahavi > Cc: openib-general@openib.org > Subject: Some opensm/osm_vl15intf.c questions > > Hi Eitan, > > In chasing an issue with a trap repress not being sent in a certain > scenario, I stumbled across the following questions about > opensm/osm_vl15intf.c. > > 1. osm_vl15_post increments qp0_mads_outstanding when a response is > expected (rfifo) and not when unsolicited (ufifo) (what appears to be > called unicasts): > > osm_vl15_post: > if( p_madw->resp_expected == TRUE ) > { > cl_qlist_insert_tail( &p_vl->rfifo, (cl_list_item_t*)p_madw ); > cl_atomic_inc( &p_vl->p_stats->qp0_mads_outstanding ); > } > else > { > cl_qlist_insert_tail( &p_vl->ufifo, (cl_list_item_t*)p_madw ); > } > > osm_vl15_shutdown retires all outstanding MADs as follows: > > osm_vl15_shutdown: > while ( p_madw != (osm_madw_t*)cl_qlist_end( &p_vl->ufifo ) ) > { > if( osm_log_is_active( p_vl->p_log, OSM_LOG_DEBUG ) ) > { > osm_log( p_vl->p_log, OSM_LOG_DEBUG, > "osm_vl15_shutdown: " > "Releasing Response p_madw = %p\n", p_madw ); > } > > osm_mad_pool_put( p_mad_pool, p_madw ); > cl_atomic_dec( &p_vl->p_stats->qp0_mads_outstanding ); > > p_madw = (osm_madw_t*)cl_qlist_remove_head( &p_vl->ufifo ); > } > > Either post should increment qp0_mads_outstanding for unsolicited or > shutdown shouldn't decrement it when removing from ufifo. If you agree, > which should it be ? > > 2. In the case of a failure from osm_vendor_send, __osm_vl15_poller > decrements qp0_mads_outstanding regardless of whether a response is > expected. This is inconsistent with the increment. This leads me to > believe that this should also be incremented for unsolicited (unicasts) > as well as those for which responses are expected. Is this correct or am > I missing something ? > > So my conclusion is that in osm_vl15_post, it should be: > > if( p_madw->resp_expected == TRUE ) > { > cl_qlist_insert_tail( &p_vl->rfifo, (cl_list_item_t*)p_madw ); > } > else > { > cl_qlist_insert_tail( &p_vl->ufifo, (cl_list_item_t*)p_madw ); > } > cl_atomic_inc( &p_vl->p_stats->qp0_mads_outstanding ); > > If you agree, I will generate a patch for this. Thanks. > > -- Hal _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general