Hal Rosenstock wrote:
Hi Eitan,

On Sun, 2005-08-28 at 10:43, Eitan Zahavi wrote:
I agree that the index 0 of the guid,lids and the new linkstates arrays
should be reserved for the default port. In the loop the index j is used
to loop over all ports 0 .. N of the HCA's. It is clear that for HCA's
port 0 will be skipped. However, since the current code does not advance
the lid and linkstate accordingly the place for the port 0 will not be
kept empty for the port 0.

Current code:
for (j = 0; j <= ca.numports; j++) {
  if (ca.ports[j]) {
     *p_lid = ca.ports[j]->base_lid;
     *p_linkstates = ca.ports[j]->state;
     p_lid++;
     p_linkstates++;
  }
}
Should be:
for (j = 0; j <= ca.numports; j++) {
  if (ca.ports[j]) {
     *p_lid = ca.ports[j]->base_lid;
     *p_linkstates = ca.ports[j]->state;
  }
  /* as j advance even if the port is not valid, so should the
     lid and state pointer */
  p_lid++;
  p_linkstates++;
}

As I could not convince you with the above explanations in my previous
mail I have written the following simple program to test the pre-and
post patch effect:

/*
  test program for dumping osm_vendor_get_all_port_attr results
*/

#include "stdio.h"
#include <stdlib.h>
#include <unistd.h>
#include <vendor/osm_vendor_api.h>
#include <opensm/osm_opensm.h>

#include <common.h>
#define GUID_ARRAY_SIZE 64
int
main() {
  osm_vendor_t vendor;
  osm_log_t osm_log;
  ib_api_status_t status;
  uint32_t num_ports = GUID_ARRAY_SIZE;
  ib_port_attr_t attr_array[GUID_ARRAY_SIZE];
  int i;

  osm_log_construct(&osm_log);
  osm_log_init(&osm_log, TRUE, 0xff, "/tmp/test_vendor.log");

  osm_vendor_init(&vendor, &osm_log, 1000);

  status = osm_vendor_get_all_port_attr(&vendor, attr_array, &num_ports );
  if ( status != IB_SUCCESS )
  {
    printf( "\nError from osm_vendor_get_all_port_attr (%x)\n", status);
    return;
  }

  printf("\nListing GUIDs:\n");
  for (i = 0; i < num_ports; i++) {
    printf("Port %i:0x%"PRIx64" lid:0x%04x state:%x\n",
           i,
           cl_hton64(attr_array[i].port_guid),
           cl_ntoh16(attr_array[i].lid),
           attr_array[i].link_state
           );
  }

  exit(0);
}

Without the above change I get:
Listing GUIDs:
Port 0:0xd9dffffff3d55 lid:0x0300 state:4
Port 1:0xd9dffffff3d55 lid:0x0400 state:4
Port 2:0xd9dffffff3d56 lid:0x0000 state:0

After the simple change I get:
Listing GUIDs:
Port 0:0xd9dffffff3d55 lid:0x0300 state:4
Port 1:0xd9dffffff3d55 lid:0x0300 state:4
Port 2:0xd9dffffff3d56 lid:0x0400 state:4

So as you can see - without the fix the lid of port 2 is presented as the lid of port 1...


I understand the difference in the code and think the difference
perhaps relates to either a lack of clarity or confusion with the API as
follows: I don't see where it is defined what the index into the port
array means. I think we have 2 different interpretations and this
relates to how opensm/main.c  handles the results of calling this
routine.
I do not follow you. Do you suggest it is OK the port at index 1 will have
the guid of port 1 but the lid and state of port 2?

I did not complain about what port is reported at what index:
Just about the mismatch of guids and lids. Please see above.


So the patch is incomplete although perhaps correct depending on the
interpretation. I'm not adverse to changing this as you indicate. I
would like to resolve this before embarking on the 1.8.0 merge.

Also, since we are in this area, I don't think switch port 0 would be
handled correctly by this code either.


I guess you use ibstatus in your mail. Well ibstatus uses its own code
so it shows the correct info anyway.


That was just to show that the port states corresponded to the ones
shown by osm_vendor_get_all_port_attr with the print statements. Nothing
else.

-- Hal


In my case that is:
swlab223:/tmp/bld/libvendor>ibstatus
Infiniband device 'mthca0' port 1 status:
        default gid:     fe80:0000:0000:0000:000d:9dff:ffff:3d55
        base lid:        0x3
        sm lid:          0x1
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            10 Gb/sec (4X)

Infiniband device 'mthca0' port 2 status:
        default gid:     fe80:0000:0000:0000:000d:9dff:ffff:3d56
        base lid:        0x4
        sm lid:          0x1
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            10 Gb/sec (4X)



Hal Rosenstock wrote:

Hi Eitan,

On Sun, 2005-08-21 at 03:32, Eitan Zahavi wrote:


osm_vendor_get_all_port_attr returns incorrect LID and state for device ports. This bug was caused by the fact that if a device port was skipped due to that fact it does not exist (HCA port 0). The lid and state pointers used as indexes into their corresponding
return value arrays were not advancing to the next port index.

So the return for a single HCA was mixing LID and state for the first
port and displayed non initialized memory for the second port.


The array is not filled in as you claim. Port 0 does not take a slot on
an HCA. This looks fine to me as is (I added some print statements in
that loop as follows):

osm_vendor_get_all_port_attr: port 0
osm_vendor_get_all_port_attr: port 1
osm_vendor_get_all_port_attr: port 1 lid 1 state 4
osm_vendor_get_all_port_attr: port 2
osm_vendor_get_all_port_attr: port 2 lid 0 state 1

Port 0 is skipped; port 1 is LID 1 and active; port 2 is not plugged in
and is down:

       Port 1:
               State: Active
               Physical state: LinkUp
               Rate: 2
               Base lid: 1
               LMC: 0
               SM lid: 1
               Capability mask: 0x00500a68
               Port GUID: 0x0008f10403960559
       Port 2:
               State: Down
               Physical state: Polling
               Rate: 2
               Base lid: 0
               LMC: 0
               SM lid: 0
               Capability mask: 0x00500a68
               Port GUID: 0x0008f1040396055a

-- Hal


_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to