Re: [Lustre-discuss] Lustre clients failing, and cant reconnect

2008-09-13 Thread Brian J. Murrell
On Sat, 2008-09-13 at 11:44 -0400, Mag Gam wrote: > By disabling statahead_max what consequences can a client face? > > # echo 0 > /proc/fs/lustre/.../statahead_max The client does not benefit from the optimizations that can be done to speed up ls -l. b. signature.asc Description: This is a

Re: [Lustre-discuss] Lustre clients failing, and cant reconnect

2008-09-13 Thread Mag Gam
By disabling statahead_max what consequences can a client face? # echo 0 > /proc/fs/lustre/.../statahead_max On Fri, Sep 5, 2008 at 11:10 AM, Brian J. Murrell <[EMAIL PROTECTED]> wrote: > On Fri, 2008-09-05 at 10:48 -0400, Jerome, Ron wrote: >> SERVER LOG

Re: [Lustre-discuss] Lustre clients failing, and cant reconnect

2008-09-05 Thread Brian J. Murrell
On Fri, 2008-09-05 at 10:48 -0400, Jerome, Ron wrote: > SERVER LOG > = > Aug 31 04:02:04 oss1 syslogd 1.4.1: restart. > Sep 1 15:15:42 oss1 kernel: LustreError: > 2450:0:(ldlm_lock.c:430:__ldlm_handle2lock()) ASS

Re: [Lustre-discuss] Lustre clients failing, and cant reconnect

2008-09-05 Thread Jerome, Ron
Original Message- > From: [EMAIL PROTECTED] [mailto:lustre-discuss- > [EMAIL PROTECTED] On Behalf Of Brock Palen > Sent: September 5, 2008 10:22 AM > To: Brian J. Murrell > Cc: Lustre Discuss > Subject: Re: [Lustre-discuss] Lustre clients failing, and cant > reconnect &g

Re: [Lustre-discuss] Lustre clients failing, and cant reconnect

2008-09-05 Thread Brock Palen
I had to reboot the MDS to get the problem to go away. I will watch and see if it reappears. I screwed up and deleted the wrong /var/log/messages So I don't have the messages. I am watching this issues. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-

Re: [Lustre-discuss] Lustre clients failing, and cant reconnect

2008-09-05 Thread Brian J. Murrell
On Fri, 2008-09-05 at 00:15 -0400, Brock Palen wrote: > Looks like that didn't fix it. One of the login nodes repeated the > behavior. So what are the messages the client logged when the problem occurred? And what, if anything was logged on the MDS at the same time? b. signature.asc Descrip

Re: [Lustre-discuss] Lustre clients failing, and cant reconnect

2008-09-04 Thread Brock Palen
Looks like that didn't fix it. One of the login nodes repeated the behavior. The strange thing is that the MDS does not show anything about the NID of the client. The client just says it lost connection with it, but the MDS never says it has not heard from the client and is kicking it out

Re: [Lustre-discuss] Lustre clients failing, and cant reconnect

2008-09-04 Thread Brock Palen
>> >> Is this enough information? > > Probably. If you are running 1.6.5, try disabling statahead on all of > your clients... > > # echo 0 > /proc/fs/lustre/.../statahead_max I thought statahead was fixed in 1.6.5 ? Main reason we upgraded. Login nodes already are showing that behavior again. I

Re: [Lustre-discuss] Lustre clients failing, and cant reconnect

2008-09-04 Thread Brian J. Murrell
On Thu, 2008-09-04 at 22:58 -0400, Brock Palen wrote: > I am having clients lose their connection to the MDS. Messages on > the clients look like this: > > Sep 4 19:51:30 nyx-login2 kernel: Lustre: nobackup-MDT- > mdc-0101fc44e800: Connection to service nobackup-MDT via nid > [E

[Lustre-discuss] Lustre clients failing, and cant reconnect

2008-09-04 Thread Brock Palen
I am having clients lose their connection to the MDS. Messages on the clients look like this: Sep 4 19:51:30 nyx-login2 kernel: Lustre: nobackup-MDT- mdc-0101fc44e800: Connection to service nobackup-MDT via nid [EMAIL PROTECTED] was lost; in progress operations using this servic