Hi Hal,
I agree that the index 0 of the guid,lids and the new linkstates arrays
should be reserved for the default port. In the loop the index j is used
to loop over all ports 0 .. N of the HCA's. It is clear that for HCA's
port 0 will be skipped. However, since the current code does not advance
the lid and linkstate accordingly the place for the port 0 will not be
kept empty for the port 0.
Current code:
for (j = 0; j <= ca.numports; j++) {
if (ca.ports[j]) {
*p_lid = ca.ports[j]->base_lid;
*p_linkstates = ca.ports[j]->state;
p_lid++;
p_linkstates++;
}
}
Should be:
for (j = 0; j <= ca.numports; j++) {
if (ca.ports[j]) {
*p_lid = ca.ports[j]->base_lid;
*p_linkstates = ca.ports[j]->state;
}
/* as j advance even if the port is not valid, so should the
lid and state pointer */
p_lid++;
p_linkstates++;
}
As I could not convince you with the above explanations in my previous
mail I have written the following simple program to test the pre-and
post patch effect:
/*
test program for dumping osm_vendor_get_all_port_attr results
*/
#include "stdio.h"
#include <stdlib.h>
#include <unistd.h>
#include <vendor/osm_vendor_api.h>
#include <opensm/osm_opensm.h>
#include <common.h>
#define GUID_ARRAY_SIZE 64
int
main() {
osm_vendor_t vendor;
osm_log_t osm_log;
ib_api_status_t status;
uint32_t num_ports = GUID_ARRAY_SIZE;
ib_port_attr_t attr_array[GUID_ARRAY_SIZE];
int i;
osm_log_construct(&osm_log);
osm_log_init(&osm_log, TRUE, 0xff, "/tmp/test_vendor.log");
osm_vendor_init(&vendor, &osm_log, 1000);
status = osm_vendor_get_all_port_attr(&vendor, attr_array, &num_ports );
if ( status != IB_SUCCESS )
{
printf( "\nError from osm_vendor_get_all_port_attr (%x)\n", status);
return;
}
printf("\nListing GUIDs:\n");
for (i = 0; i < num_ports; i++) {
printf("Port %i:0x%"PRIx64" lid:0x%04x state:%x\n",
i,
cl_hton64(attr_array[i].port_guid),
cl_ntoh16(attr_array[i].lid),
attr_array[i].link_state
);
}
exit(0);
}
Without the above change I get:
Listing GUIDs:
Port 0:0xd9dffffff3d55 lid:0x0300 state:4
Port 1:0xd9dffffff3d55 lid:0x0400 state:4
Port 2:0xd9dffffff3d56 lid:0x0000 state:0
After the simple change I get:
Listing GUIDs:
Port 0:0xd9dffffff3d55 lid:0x0300 state:4
Port 1:0xd9dffffff3d55 lid:0x0300 state:4
Port 2:0xd9dffffff3d56 lid:0x0400 state:4
So as you can see - without the fix the lid of port 2 is presented as
the lid of port 1...
I guess you use ibstatus in your mail. Well ibstatus uses its own code
so it shows the correct info anyway. In my case that is:
swlab223:/tmp/bld/libvendor>ibstatus
Infiniband device 'mthca0' port 1 status:
default gid: fe80:0000:0000:0000:000d:9dff:ffff:3d55
base lid: 0x3
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 10 Gb/sec (4X)
Infiniband device 'mthca0' port 2 status:
default gid: fe80:0000:0000:0000:000d:9dff:ffff:3d56
base lid: 0x4
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 10 Gb/sec (4X)
Hal Rosenstock wrote:
Hi Eitan,
On Sun, 2005-08-21 at 03:32, Eitan Zahavi wrote:
osm_vendor_get_all_port_attr returns incorrect LID and state for
device ports. This bug was caused by the fact that if a device port
was skipped due to that fact it does not exist (HCA port 0). The
lid and state pointers used as indexes into their corresponding
return value arrays were not advancing to the next port index.
So the return for a single HCA was mixing LID and state for the first
port and displayed non initialized memory for the second port.
The array is not filled in as you claim. Port 0 does not take a slot on
an HCA. This looks fine to me as is (I added some print statements in
that loop as follows):
osm_vendor_get_all_port_attr: port 0
osm_vendor_get_all_port_attr: port 1
osm_vendor_get_all_port_attr: port 1 lid 1 state 4
osm_vendor_get_all_port_attr: port 2
osm_vendor_get_all_port_attr: port 2 lid 0 state 1
Port 0 is skipped; port 1 is LID 1 and active; port 2 is not plugged in
and is down:
Port 1:
State: Active
Physical state: LinkUp
Rate: 2
Base lid: 1
LMC: 0
SM lid: 1
Capability mask: 0x00500a68
Port GUID: 0x0008f10403960559
Port 2:
State: Down
Physical state: Polling
Rate: 2
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00500a68
Port GUID: 0x0008f1040396055a
-- Hal
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general