Re: [Lustre-discuss] 1.6.5

2008-05-16 Thread Andreas Dilger
On May 15, 2008 14:15 -0700, Jim Garlick wrote: My lustre/ChangeLog for 1.6.5-RC2 says: * version 1.6.5 * Support for kernels: 2.6.5-7.311 (SLES 9), 2.6.9-67.0.7.EL (RHEL 4), 2.6.16.54-0.2.5 (SLES 10), 2.6.18-53.1.14.el5 (RHEL 5),

[Lustre-discuss] kernel panic with 1.6.5rc2 on mds

2008-05-16 Thread Patrick Winnertz
Hello, As I wrote in #11742 [1] I experienced a kernel panic after doing heavy I/O on the 1.6.5rc2 cluster on the mds. Since nobody answered to this bug until now (and I think in other cases the lustre team is _really_ fast (thanks for that :))) I fear that it was not recognised by anybody.

Re: [Lustre-discuss] client randomly evicted

2008-05-16 Thread Yong Fan
Robin Humble 写道: On Thu, May 15, 2008 at 08:23:20AM -0400, Aaron Knister wrote: Ah! That would make a lot of sense. echoing 0 to statahead_count doesn't really do anything other than hang my session. Thanks! I think the hang echo'ing into /proc is another bug, but yeah, deal with

Re: [Lustre-discuss] Can lustre be trusted to keep my data safe?

2008-05-16 Thread jrs
Greetings all, Thanks to everyone who offered comments on my original question about lustre's trustworthiness. I apologize if I seem to be flogging this issue to death but I have to make a decision in the next day or two about whether to move ahead with deploying lustre and I'm probably more

[Lustre-discuss] Luster access locking up login nodes

2008-05-16 Thread Brock Palen
I have seen this behavior a few times. Under heavy IO lustre will just stop and dmesg will have the following: LustreError: 3976:0:(events.c:134:client_bulk_callback()) event type 0, status -5, desc 01012ce12000 LustreError: 11-0: an error occurred while communicating with [EMAIL

Re: [Lustre-discuss] Luster access locking up login nodes

2008-05-16 Thread Brian J. Murrell
On Fri, 2008-05-16 at 15:48 -0400, Brock Palen wrote: I have seen this behavior a few times. Under heavy IO lustre will just stop and dmesg will have the following: Review the list archives for statahead problems. b. signature.asc Description: This is a digitally signed message part

Re: [Lustre-discuss] Luster access locking up login nodes

2008-05-16 Thread Brock Palen
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ahh didn't realize this was related to that. Good to know fix in the works (2 x4500's on the way so we have made a commitment to lustre). How would I make this option the default on boot? There isn't an llite module I see on the clients. I can

Re: [Lustre-discuss] Luster access locking up login nodes

2008-05-16 Thread Brian J. Murrell
On Fri, 2008-05-16 at 16:24 -0400, Brock Palen wrote: Ahh didn't realize this was related to that. Sure looks like it from what you posted. How would I make this option the default on boot? initscript. b. signature.asc Description: This is a digitally signed message part

Re: [Lustre-discuss] Looping in __d_lookup

2008-05-16 Thread Oleg Drokin
Hello! On May 15, 2008, at 5:34 AM, Jakob Goldbach wrote: On a regular basis a process get stuck in __d_lookup. When I dig in it seems I'm caught in the hlist_for_each_entry_rcu loop never satisfying the exit-from-loop condition. Hm, so you actually have a circular loop? I wonder if you can

[Lustre-discuss] Help reviving a 1.4.x volume with a destroyed OST

2008-05-16 Thread Klaus Steden
Hello there, We had a bit of an accident in one of our labs earlier today, and it effectively destroyed one of the OSTs in the Lustre file system. From what I can figure (I wasn't there at the time), one of the OSSes re-provisioned itself accidentally, and installed its OS information on one of