Re: [Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-25 Thread Bob Ball
1.6.6 gave us lots of problems. We are using 1.8.4 here. Has better tools, for one thing, eg, lfs_migrate. bob On 5/24/2011 7:37 PM, Mag Gam wrote: stick with 1.6.6 , its a great release! BTW, why did you decide to upgrade to 1.8.x? is there a feature you are looking for? On Fri, May 20,

Re: [Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-24 Thread Mag Gam
stick with 1.6.6 , its a great release! BTW, why did you decide to upgrade to 1.8.x? is there a feature you are looking for? On Fri, May 20, 2011 at 2:48 PM, Aaron Everett aever...@forteds.com wrote: Thanks for the tip. I've already updated with the LU-286 patch, but I'll build new rpms with

Re: [Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-20 Thread Johann Lombardi
On Thu, May 19, 2011 at 01:57:33PM -0400, Aaron Everett wrote: Sorry for the noise. I cleaned everything up, untarred a fresh copy of np. BTW, while you are patching the lustre client, you might also want to apply the following patch http://review.whamcloud.com/#change,457 which fixes a memory

Re: [Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-20 Thread Aaron Everett
Thanks for the tip. I've already updated with the LU-286 patch, but I'll build new rpms with both patches and roll that out too. Since updating with the LU-286 patch Lustre has been running cleanly. Thanks for the support and the work! Aaron On Fri, May 20, 2011 at 4:40 AM, Johann Lombardi

Re: [Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-19 Thread Aaron Everett
Thanks for the replies. Does the patch need to be applied to clients' lustre-module rpms only, client and server lustrelustre-module rpms, or will I need to build new kernels for the servers as well? Best regards, Aaron On Wed, May 18, 2011 at 9:17 PM, Johann Lombardi joh...@whamcloud.comwrote:

Re: [Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-19 Thread Johann Lombardi
On Thu, May 19, 2011 at 11:06:08AM -0400, Aaron Everett wrote: Thanks for the replies. Does the patch need to be applied to clients' lustre-module rpms only, client and server lustrelustre-module rpms, or will I need to build new kernels for the servers as well? You only need to apply the

Re: [Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-19 Thread Aaron Everett
Excellent. Thanks for the quick reply. Building new rpm's now. Aaron On Thu, May 19, 2011 at 11:17 AM, Johann Lombardi joh...@whamcloud.comwrote: On Thu, May 19, 2011 at 11:06:08AM -0400, Aaron Everett wrote: Thanks for the replies. Does the patch need to be applied to clients'

Re: [Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-19 Thread Aaron Everett
I'm getting a build error: make[5]: Entering directory `/usr/src/kernels/2.6.18-238.9.1.el5-x86_64' /usr/src/redhat/BUILD/lustre-1.8.5/lustre/mdc/mdc_lib.c:828: error: conflicting types for 'mdc_getattr_pack' /usr/src/redhat/BUILD/lustre-1.8.5/lustre/mdc/mdc_internal.h:56: error: previous

Re: [Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-19 Thread Johann Lombardi
On Thu, May 19, 2011 at 11:51:49AM -0400, Aaron Everett wrote: I'm getting a build error: make[5]: Entering directory `/usr/src/kernels/2.6.18-238.9.1.el5-x86_64' /usr/src/redhat/BUILD/lustre-1.8.5/lustre/mdc/mdc_lib.c:828: error: conflicting types for 'mdc_getattr_pack'

Re: [Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-19 Thread Aaron Everett
Sorry for the noise. I cleaned everything up, untarred a fresh copy of lustre1.8.5, applied the patch, configured, and successfully made patches. I'm not sure what went wrong last time. Aaron On Thu, May 19, 2011 at 12:14 PM, Johann Lombardi joh...@whamcloud.comwrote: On Thu, May 19, 2011 at

Re: [Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-18 Thread Aaron Everett
More information: The frequency of these errors was dramatically reduced by changing /proc/fs/lustre/osc/fdfs-OST000[0-3]-osc/max_rpcs_in_flight from 8 to 32. Processor, memory, and disk I/O on the servers is not high, is there a reason for not increasing max_rpcs_in_flight from 32 to 48 or 64?

Re: [Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-18 Thread Jeremy Filizetti
max_rpcs_in_flight has a max of 256 in 1.8.5 IIRC. The down side is a single client can consume more resources from the OSS. Looks like your using ksocklnd for your LND below so you may also have to increase the ksocklnd peer_credits module parameters which defaults to 8 as well. Jeremy On

Re: [Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-18 Thread Johann Lombardi
On Tue, May 17, 2011 at 08:13:42PM -0400, Aaron Everett wrote: Code: 48 89 08 31 c9 48 89 12 48 89 52 08 ba 01 00 00 00 83 83 10 RIP [8891ddcd] :mdc:mdc_exit_request+0x6d/0xb0 RSP 81028c137858 CR2: 3877 0Kernel panic - not syncing: Fatal exception This bug was

[Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-17 Thread Aaron Everett
Hi all, We've been running Lustre 1.6.6 for several years and are deploying 1.8.5 on some new hardware. When under load we've been seeing random kernel panics on many of the clients. We are running 2.6.18-194.17.1.el5_lustre.1.8.5 on the servers (shared MDT/MGS, and 4 OST's. We have patchless