Thanks for bringing this up, Sean. A little more rationale:
1. We basically modeled usnic_devinfo(1) off ibv_devinfo(1); it's primarily
aimed at sysadmins/IT as part of verification that our usNIC stack is
functioning properly. It's currently using usnic provider extensions to get
this information (look at the "Fabric extensions: netinfo" section in
https://ofiwg.github.io/libfabric/v1.6.0/man/fi_usnic.7.html), but it would be
nice if this kind of stuff was standardized somehow. Here's output from
usnic_devinfo:
$ /opt/cisco/usnic/bin/usnic_devinfo -d usnic_0
usnic_0:
Interface: vic20
MAC Address: 24:57:20:06:20:00
IP Address: 10.10.0.6
Netmask: 255.255.0.0
Prefix len: 16
MTU: 9000
Link State: UP
Bandwidth: 10 Gb/s
Device ID: UCSC-PCIE-CSC-02 [VIC 1225] [0x0085]
Vendor ID: 4407
Vendor Part ID: 207
Firmware: 4.1(1d)
VFs: 64
CQ per VF: 4
QP per VF: 6
Interrupts per VF: 4
Max CQ: 256
Max CQ Entries: 65535
Max QP: 384
Max Send Credits: 4095
Max Recv Credits: 4095
Capabilities:
Map per res: yes
PIO sends: no
CQ interrupts: no
I'm quite sure a lot of this specific information is unique to our device; I
don't think that these exact fields are worth standardizing, of course.
2. Another topic that comes up not infrequently is the ability to correlate a
fabric/domain/endpoint to some other corresponding Linux entity, such as an IP
interface and/or PCI device (if relevant). This obviously doesn't work for
fabrics/domains/endpoints that represent emulation devices, may be tricky for
bonded devices, ...etc. But there are many providers that create
fabrics/domains/endpoints that directly correlate with a specific Linux device.
Tools like hwloc (and therefore Open MPI) could definitely use this
information for determining locality, especially where short message latency
matters.
Some sort of optional of fabric/domain/endpoint correlation to a Linux device
would be genuinely useful.
I honestly haven't given a ton of thought to either of these other than "that
would be useful"; apologies if this is somewhat half-baked.
> On May 3, 2018, at 4:45 PM, Hefty, Sean <[email protected]> wrote:
>
> There has been a long outstanding set of requests to obtain HW specific data
> from libfabric. A side discussion brought this topic up again, so I'd like
> to at least put it on the agenda as a possible feature for 1.7. As a point
> of reference, Cisco has implemented a set of provider specific ops to
> retrieve device specific data. It's fairly simple, and details are here:
>
> https://github.com/cisco/usnic_tools/blob/master/usnic_devinfo.c
>
> This feature would obviously only apply to providers that are directly
> associated with some sort of HW device.
>
> What I would like to start to collect is a list of what sort of attributes
> would be desirable to report, or what applications or users could make use of.
>
> - Sean
> _______________________________________________
> ofiwg mailing list
> [email protected]
> http://lists.openfabrics.org/mailman/listinfo/ofiwg
--
Jeff Squyres
[email protected]
_______________________________________________
ofiwg mailing list
[email protected]
http://lists.openfabrics.org/mailman/listinfo/ofiwg