Sean Hefty wrote:


The RMPP code returns the size of the receive as sizeof MAD header + sizeof RMPP
header + optional sizeof other header (e.g. SA header) + actual payload.  This
size can be used to allocate a data buffer large enough to hold the reassembled
MAD.  You should be able to use this to determine the number of records in the
payload.
Good. But how is that size delivered? I mean through umad to the client.

From my first email on this thread you can see there is at least one bug in the chain of events:
a. First segment paylen should be either 0 or correct value - it is     
   neither. Should be 264 but is 440
b. Last segment paylen MUST be updated to reflect the size of the data
   in the MAD (including class header) - should be 24 but is 100.
c. In the receiver the re-assembled data size is not correct. OpenSM
   reports it got a 200 bytes MAD back. Probably a bug in the vendor
   layer or umad.

Here is the full data again.



1.      NodeRecord MAD size is 112bytes (note the required padding of 4
bytes at the end of the NodeRec data).
2.      OpenSM log file shows the query should return 2 records one for
each end-port. This really happens:


        Aug 21 14:59:49 998104 [40D9DBB0] -> __osm_nr_rcv_create_nr:
Looking for NodeRecord with LID: 0x0 GUID:0x0000000000000000

        Aug 21 14:59:49 998224 [40D9DBB0] -> __osm_nr_rcv_new_nr: New
NodeRecord: node 0x0002c902000017a0

                                        port 0x0002c902000017a1, lid
0x1.

        Aug 21 14:59:49 998327 [40D9DBB0] -> __osm_nr_rcv_new_nr: New
NodeRecord: node 0x0002c902000017a0

                                        port 0x0002c902000017a2, lid
0x2.

        Aug 21 14:59:49 998395 [40D9DBB0] -> osm_nr_rcv_process:
Returning 2 records.

3.      On the wire we see the following (see attached gif for more
details):
a.      Two data segments were sent and two ACKs were returned. This is
OK.
b.      The first segment reports PayLen = 440bytes. According to the
spec the first segment might provide paylen != 0 and when it is done it
should be equal to the (class header * Num-Segments) + data length. In
our case we have data length = 2*112, and SA extra header = 20byte *
2seg. This leads to peylen=264 and not 440!!!
The spec defines that in p775-l37.
So this is a violation of the spec.
c.      The last segment (segment 2) provides the paylen field of 100.
The expected value for the last segment length should have been: SA
extra header + leftover data size from prev segments. Since the first
segment has 200bytes for data the left over should have been 112*2 - 200
= 24. With the SA extra header 44bytes.
So this is another violation of the spec.
d.      The analyzer is confused by the above and reports the result as
having 3 NodeRecords.
e.      <<Gen2 NodeRec GetTable RMPP Format Error.GIF>>
4.      Following that when we trace the log file of osmtest we find
more issues. Probably caused by changes to the vendor layer or the rmpp
assembly: It is expected that after assembly the size of the RMPP mad
reported to the osm vendor layer will be the rmpp header + SA extra
header + data-size. In our case that is 32 + 20 + 2*112 = 276.

        The log file shows:

        Aug 21 14:59:49 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 =
200 / 112 (88)

        Aug 21 14:59:49 [4017F6C0] -> osmtest_write_all_node_recs:
Received 1 records
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to