Title: RE: IB Diagnositic Tools

Hi Fabian

>
> I think this is a decent idea.  My only reservations are that it would require
> everyone to learn the OSM Vendor Layer API.  It might also not allow testing
> nuances in the access layer APIs, which might be useful.
[EZ] This is true. But the API is simple. The MAD flow API is:
bind - to get a handle for sending mads of specific class and registering callbacks
send - to send a mad
get_mad - to get a mad buffer
put_mad - to return it to the driver

The rest can be found in the OpenSM repository under osm_vendor_api.h

>
> So I think it would be useful to have the test run over each low level MAD API,
> as well as to the OSM Vendor Layer.  I'm a bit weary of adding extra layers
> between the tests and the access layer - it just creates more areas where things
> can go wrong.  That said, I'm not dead set on this and could be convinced
> otherwise, but I just don't know enough about the OSM Vendor Layer at the moment
> and don't have many cycles to learn it.
[EZ] I agree. Code testing should be done in all layers. But writing cluster debug tools is easier with a higher abstraction layer (callbacks vs. polling or blocking reads).

>
>
> By system names, you mean node descriptions?
[EZ] If the user provide a file describing the topology in terms of systems then the code uses the names provided in the file in its reports.

For example: Assuming you have a cluster built of a 288port switch and 288 HCAs.
The topology description could then be:
IBSW288 mySwitch
   Leaf1/P1 -> HCA Rack1-Node1 P1
   Leaf1/P2 -> HCA Rack1-Node2 P1
   ...
   Leaf1/P12 -> HCA Rack2-Node3 P1
   Leaf2/P1 ->   HCA anyNameYouWant P2
   ....

Then any error report can be provided in these names like:
Error with cable from mySwitch/Leaf2/P1 to anyNameYouWant/P1


_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to