Were seeing problems which seem to be lock-related and result in data
loss. This is a fairly low probability event: it's happening on a ~1000
core compute farm which is heavily loaded and the frequency is of the
order of tens of failures an hour.
The clients are running kernel 2.6.22 and lustre
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
subscribe
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Tue, Feb 10, 2009 at 1:44 AM, Isaac Huang he.hu...@sun.com wrote:
On Mon, Feb 09, 2009 at 04:52:20PM +0100, G?tz Waschk wrote:
My client has this in modprobe.conf:
options lnet networks=o2ib,tcp
I'm trying to mount the remote network with
mount -t lustre 141.34.228...@tcp0:/atlas
On Mon, Feb 09, 2009 at 04:52:20PM +0100, G?tz Waschk wrote:
Hello everyone,
.
My client has this in modprobe.conf:
options lnet networks=o2ib,tcp
I'm trying to mount the remote network with
mount -t lustre 141.34.228...@tcp0:/atlas /scratch/lustre-1.6/atlas
and the command just hangs,
On Tue, Feb 10, 2009 at 1:45 PM, Johann Lombardi joh...@sun.com wrote:
38 = MDS_CONNECT. The client tries to reach the MDT via 192.168.22...@o2ib,
whereas i think it should use tcp to access the lustre filesystem of the
remote
cluster, is my understanding of your configuration correct?
That's
Oleg Drokin wrote:
What would be useful here is if you can enable dlm tracing (echo
+dlm_trace /proc/sys/lnet/debug)
on some of those 1.6 nodes (also if you are running with no debug
enabled at all,
also enable rpc_trace and info levels) and also enable dump on
eviction feature.
Hello!
On Feb 10, 2009, at 12:11 PM, Simon Kelley wrote:
We are also seeing some userspace file operations fail with the
error
No locks available. These don't generate any logging on the
client so
I don't have exact timing. It's possible that they are associated
with
further ###
Hello!
On Feb 10, 2009, at 12:46 PM, Simon Kelley wrote:
If, by the complete event you mean the received cancel for
unknown cookie, there's not much more to tell. Grepping through the
last month's server logs shows that there are bursts of typically
between 3 and 7 messages, at the same