On Oct 14, 2005, at 2:27 PM, Troy Benjegerdes wrote:



From: Hal Rosenstock <[EMAIL PROTECTED]>
Date: October 14, 2005 12:41:13 PM CDT
To: Troy Benjegerdes <[EMAIL PROTECTED]>
Cc: IBMEHCA DD <[EMAIL PROTECTED]>, [email protected]
Subject: Re: [openib-general] Re: IBM eHCA testing..


On Fri, 2005-10-14 at 12:08, Troy Benjegerdes wrote:
Hal Rosenstock wrote:

On Thu, 2005-10-13 at 18:46, Troy Benjegerdes wrote:


I'm also attaching part of an opensm log file.

(the full copy is at http://scl.ameslab.gov/~troy/osm-ehca.log )

The IBM galaxy adapters are at:
        Initial path: [0][1][16]
        Initial path: [0][1][13]




The OpenSM is just saying that a SMP transaction it issued (in this
case, SM Get P_KeyTable) is timing out (no response made it back to
OpenSM).

BTW, what svn rev is OpenSM up to ?

-- Hal


So, how about a patch to opensm to report what svn rev it was built from ;)

Can you do svn info in the userspace/management/osm directory ?

Path: .
URL: https://openib.org/svn/gen2/trunk/src/linux-kernel/infiniband
Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd
Revision: 3493
Node Kind: directory
Schedule: normal
Last Changed Author: roland
Last Changed Rev: 3487
Last Changed Date: 2005-09-19 17:59:27 -0500 (Mon, 19 Sep 2005)
Properties Last Updated: 2005-02-15 16:24:20 -0600 (Tue, 15 Feb 2005)


I just discovered another problem.. We have been running pfvs2 over
IPoIB on the same subnet, and in debugging this, I restarted opensm
several times, and somewhere in the stack a PVFS2 write failed. I
wouldn't think that a short downtime of the SM from restarting it would
cause any IPoIB TCP sessions to fall over..

As Fab indicated, there are a number of places where the SM/SA is
needed:
1. SA PathRecords (used when a path to a new IP end node is needed or an
existing one timesout)
2. SA MCMemberRecord joins, queries, and leaves (used when an interface
is up'ed, down'ed, etc.)

Is this on an existing TCP session ? Is it OpenIB IPoIB clients at each
end ? What svn version is being used for this ?

-- Hal

It looks like each client node maintains an open TCP stream to each of the servers. pvfs2 appears to not be very robust to failure. However the pvfs2 folks just released a new version which changes their network protocol somewhat. I plan to get the new version installed next week and will see if it handles things a bit more robustly.

Brett

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to