[Lustre-discuss] lock timeouts and OST evictions on 1.4 server - 1.6 client system.

2009-02-10 Thread Simon Kelley
Were seeing problems which seem to be lock-related and result in data loss. This is a fairly low probability event: it's happening on a ~1000 core compute farm which is heavily loaded and the frequency is of the order of tens of failures an hour. The clients are running kernel 2.6.22 and lustre

[Lustre-discuss] subscribe

2009-02-10 Thread Brian Stone
___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] (no subject)

2009-02-10 Thread Brian Stone
subscribe ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] problem mounting two different lustre instances

2009-02-10 Thread Götz Waschk
On Tue, Feb 10, 2009 at 1:44 AM, Isaac Huang he.hu...@sun.com wrote: On Mon, Feb 09, 2009 at 04:52:20PM +0100, G?tz Waschk wrote: My client has this in modprobe.conf: options lnet networks=o2ib,tcp I'm trying to mount the remote network with mount -t lustre 141.34.228...@tcp0:/atlas

Re: [Lustre-discuss] problem mounting two different lustre instances

2009-02-10 Thread Isaac Huang
On Mon, Feb 09, 2009 at 04:52:20PM +0100, G?tz Waschk wrote: Hello everyone, . My client has this in modprobe.conf: options lnet networks=o2ib,tcp I'm trying to mount the remote network with mount -t lustre 141.34.228...@tcp0:/atlas /scratch/lustre-1.6/atlas and the command just hangs,

Re: [Lustre-discuss] problem mounting two different lustre instances

2009-02-10 Thread Götz Waschk
On Tue, Feb 10, 2009 at 1:45 PM, Johann Lombardi joh...@sun.com wrote: 38 = MDS_CONNECT. The client tries to reach the MDT via 192.168.22...@o2ib, whereas i think it should use tcp to access the lustre filesystem of the remote cluster, is my understanding of your configuration correct? That's

Re: [Lustre-discuss] lock timeouts and OST evictions on 1.4 server - 1.6 client system.

2009-02-10 Thread Simon Kelley
Oleg Drokin wrote: What would be useful here is if you can enable dlm tracing (echo +dlm_trace /proc/sys/lnet/debug) on some of those 1.6 nodes (also if you are running with no debug enabled at all, also enable rpc_trace and info levels) and also enable dump on eviction feature.

Re: [Lustre-discuss] lock timeouts and OST evictions on 1.4 server - 1.6 client system.

2009-02-10 Thread Oleg Drokin
Hello! On Feb 10, 2009, at 12:11 PM, Simon Kelley wrote: We are also seeing some userspace file operations fail with the error No locks available. These don't generate any logging on the client so I don't have exact timing. It's possible that they are associated with further ###

Re: [Lustre-discuss] lock timeouts and OST evictions on 1.4 server - 1.6 client system.

2009-02-10 Thread Oleg Drokin
Hello! On Feb 10, 2009, at 12:46 PM, Simon Kelley wrote: If, by the complete event you mean the received cancel for unknown cookie, there's not much more to tell. Grepping through the last month's server logs shows that there are bursts of typically between 3 and 7 messages, at the same