On Oct 14, 2005, at 2:27 PM, Troy Benjegerdes wrote:
From: Hal Rosenstock <[EMAIL PROTECTED]>
Date: October 14, 2005 12:41:13 PM CDT
To: Troy Benjegerdes <[EMAIL PROTECTED]>
Cc: IBMEHCA DD <[EMAIL PROTECTED]>, [email protected]
Subject: Re: [openib-general] Re: IBM eHCA testing..
On Fri, 2005-10-14 at 12:08, Troy Benjegerdes wrote:
Hal Rosenstock wrote:
On Thu, 2005-10-13 at 18:46, Troy Benjegerdes wrote:
I'm also attaching part of an opensm log file.
(the full copy is at http://scl.ameslab.gov/~troy/osm-ehca.log )
The IBM galaxy adapters are at:
Initial path: [0][1][16]
Initial path: [0][1][13]
The OpenSM is just saying that a SMP transaction it issued (in this
case, SM Get P_KeyTable) is timing out (no response made it back to
OpenSM).
BTW, what svn rev is OpenSM up to ?
-- Hal
So, how about a patch to opensm to report what svn rev it was built
from ;)
Can you do svn info in the userspace/management/osm directory ?
Path: .
URL: https://openib.org/svn/gen2/trunk/src/linux-kernel/infiniband
Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd
Revision: 3493
Node Kind: directory
Schedule: normal
Last Changed Author: roland
Last Changed Rev: 3487
Last Changed Date: 2005-09-19 17:59:27 -0500 (Mon, 19 Sep 2005)
Properties Last Updated: 2005-02-15 16:24:20 -0600 (Tue, 15 Feb 2005)
I just discovered another problem.. We have been running pfvs2 over
IPoIB on the same subnet, and in debugging this, I restarted opensm
several times, and somewhere in the stack a PVFS2 write failed. I
wouldn't think that a short downtime of the SM from restarting it
would
cause any IPoIB TCP sessions to fall over..
As Fab indicated, there are a number of places where the SM/SA is
needed:
1. SA PathRecords (used when a path to a new IP end node is needed or
an
existing one timesout)
2. SA MCMemberRecord joins, queries, and leaves (used when an interface
is up'ed, down'ed, etc.)
Is this on an existing TCP session ? Is it OpenIB IPoIB clients at each
end ? What svn version is being used for this ?
-- Hal
It looks like each client node maintains an open TCP stream to each of
the servers. pvfs2 appears to not be very robust to failure. However
the pvfs2 folks just released a new version which changes their network
protocol somewhat. I plan to get the new version installed next week
and will see if it handles things a bit more robustly.
Brett
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general