Hi Hal,

Seems I was able to reproduce the osmtest failure (hope same one Viswa see).
I have left it running for a while on a machine and after 736
iterations it failed. Once it did - I stopped the loop.

From osm.log I see:
Sep 25 02:50:56 463143 [8003] -> osm_vendor_send: ERR 5430: Send p_madw = 
0x80a49f8 failed -5 (Cannot allocate memory).
...
Sep 25 02:50:57 463991 [C004] -> osm_vendor_send: ERR 5430: Send p_madw = 
0x80a49f8 failed -5 (Cannot allocate memory).
...
Sep 25 02:50:58 463751 [8003] -> osm_vendor_send: ERR 5430: Send p_madw = 
0x80a49f8 failed -5 (Cannot allocate memory).

Sep 25 02:50:59 462938 [C004] -> __osm_sr_rcv_respond: [
Sep 25 02:50:59 462955 [C004] -> __osm_sr_rcv_respond: Generating response with 
744 records.
...
Sep 25 02:50:59 463489 [C004] -> osm_vendor_send: RMPP 1 length 131000
Sep 25 02:50:59 463518 [C004] -> osm_vendor_send: ERR 5430: Send p_madw = 
0x80a49f8 failed -5 (Cannot allocate memory).
Sep 25 02:50:59 463549 [C004] -> __osm_sa_mad_ctrl_send_err_callback: [
Sep 25 02:50:59 463566 [C004] -> __osm_sa_mad_ctrl_send_err_callback: ERR 1A06: 
MAD transaction completed in error.

From osmtest I get:
Sep 25 02:50:56 461412 [4000] -> osmt_get_all_services_and_check_names: Getting 
All Service Records
Sep 25 02:50:56 461429 [4000] -> osmv_query_sa: [
Sep 25 02:50:56 461445 [4000] -> osmv_query_sa DBG:001 SVC_REC_BY_NAME
Sep 25 02:50:56 461462 [4000] -> __osmv_send_sa_req: [
Sep 25 02:50:56 461478 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: [
Sep 25 02:50:56 461498 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: Using 
previously stored lid:0x0001 sm_lid:0x0001
Sep 25 02:50:56 461515 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: ]
Sep 25 02:50:56 461555 [4000] -> osm_mad_pool_get: [
...
Sep 25 02:51:00 461961 [8003] -> umad_receiver: ERR 5409: send completed with 
error (method=12 attr=31) -- dropping.
Sep 25 02:51:00 461979 [8003] -> umad_receiver: ERR 5410: class 0x3 LID 0x0

Is it possible there is a max limit on MAD size in umad? It seems the SM fails 
to allocate the size of the MAD required
for answering the "get all service records" query.

Another interesting message is the last message saying
"umad_receiver: ERR 5410: class 0x3 LID 0x0" Why is the reported LID 0 ?

Will you be able to handle the mad allocation?

Please advice

Eitan

Eitan Zahavi wrote:
Hi Viswa,

Please run step 4 with verbose :  osmtest -f a -V -l /tmp/osmtest.log
If it fails - please send us one copy of the /tmp/osmtest.log

This is just a guess but I think the "bug" will be in the fact that the SM
did had a chance to completely cleanup between the tests and the tests are
picky about the SM state (like number of services, multicast groups etc.

We will try to reproduce in here too.

Thanks

Eitan

Viswanath Krishnamurthy wrote:

On 23 Sep 2005 13:49:31 -0400, Hal Rosenstock < [EMAIL PROTECTED]> wrote:

Hi Viswa,

On Fri, 2005-09-23 at 13:43, Viswanath Krishnamurthy wrote:


More information,

The test case is as follows

1. Start opensm in verbose mode (-V)
2. Ping remote node 3. osmtest -f c
4. osmtest -f a
5. pkill -9 opensm
6. Repeat over

Out of about 2500 iterations, 143  osmtest  failed. Keep in mind,
only Step 4 failed.


Yes.

Do you see any port LEDs on the switch blink indicating the port went
down from active and back while running this  ?



No, I ran this test overnight and logged the results.  I will try it next week 
and let you know.





Step 3 which is inventory file creation *never* failed. (I think
inventory file creation also talks to SA right ?)


Right.

-- Hal







_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to