Hi Eitan, On Wed, 2005-11-09 at 02:46, Eitan Zahavi wrote: > Hi Hal, > > I would like to bring this to MgtWG before we change anything. > IMO the situation when this happens is really not "legal" since if the > SM's are not coordinated at least in their SM_Key it will cause the two > masters on the subnet.
Correct. That's what the current compliance says. > >From our experience it is always better to cause a fatal flow and exit > the SM rather then report the event in some log - normally it will not > be seen ... To upper layer management too (not just in a log). It's more than just reporting in a log; it's the exiting which relinquishes the subnet. > I know this is a controversial issue. Feel free to bring this up at the MgtWG. > BTW: Another feature I would like to bring up is the SM behavior when it > recognizes duplicated GUID on the subnet. Currently it will just issue > an error in the log file. > I would propose to make it abort after sending a log event describing > the DR paths to these two devices. > > What do you say? If aborting means exiting (terminating) the SM in this case, I think that is not a good thing and should be avoided. In the case of a duplicated GUID (which should not occur), a choice needs to be made as to which one to honor. The other should be ignored. The two ways I can envision this is: (1) duplication of GUID in multiple nodes (bad manufacturing process), and (2) SM bug of some sort. -- Hal > EZ > > Eitan Zahavi > Design Technology Director > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -----Original Message----- > > From: Hal Rosenstock [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, November 08, 2005 11:09 PM > > To: [email protected] > > Subject: [openib-general] OpenSM and Wrong SM_Key > > > > Hi, > > > > Currently, when OpenSM receives SMInfo with a different SM_Key, it > exits > > as follows: > > > > > > void > > __osm_sminfo_rcv_process_get_response( > > IN const osm_sminfo_rcv_t* const p_rcv, > > IN const osm_madw_t* const p_madw ) > > { > > ... > > > > > > > > /* > > Check that the sm_key of the found SM is the same as ours, > > or is zero. If not - OpenSM cannot continue with configuration!. > */ > > if ( p_smi->sm_key != 0 && > > p_smi->sm_key != p_rcv->p_subn->opt.sm_key ) > > { > > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > > "__osm_sminfo_rcv_process_get_response: ERR 2F18: " > > "Got SM with sm_key that doesn't match our " > > "local key. Exiting\n" ); > > osm_log( p_rcv->p_log, OSM_LOG_SYS, > > "Found remote SM with non-matching sm_key. Exiting\n" ); > > osm_exit_flag = TRUE; > > goto Exit; > > } > > > > C14-61.2.1 states that: > > A master SM which finds a higher priority master SM with the wrong > > SM_Key should not relinquish the subnet. > > > > Exiting OpenSM relinquishes the subnet. > > > > So it appears to me that perhaps this behavior of exiting OpenSM > should > > be at least contingent on the SM state and relative priority of the > > SMInfo received. Make sense ? If so, I will work on a patch for this. > > > > -- Hal > > > > > > _______________________________________________ > > openib-general mailing list > > [email protected] > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
