Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically

2010-12-21 Thread Jeff Johnson
Daniel, In the future you might want to consider posting some entries or pieces of a log rather than the entire log file. =) Was this from the OSS that you say was rebooting or from your MDS node? I would look at the log file of the OSS node(s) that contain OST0006 and OST0007 and see if

Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically

2010-12-21 Thread Brian J. Murrell
On Tue, 2010-12-21 at 11:13 +0530, Daniel Raj wrote: I am Daniel. My OSS getting automatically rebooted again and again If you mean a full reboot and not a panic, this is very likely not a Lustre problem. *kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@

Re: [Lustre-discuss] how to reuse OST indices (EADDRINUSE)

2010-12-21 Thread Andreas Dilger
On 2010-12-21, at 8:58, Charles Taylor tay...@hpc.ufl.edu wrote: So we are evacuating all the OSTs, replacing the Areca 1680ix cards with Adaptec 51645s, re-initializing the LUNs, reformatting the LUNs as OSTs (using the same OST index as before) and remounting them.That is the plan

Re: [Lustre-discuss] how to reuse OST indices (EADDRINUSE)

2010-12-21 Thread Charles Taylor
On Dec 21, 2010, at 12:39 PM, Andreas Dilger wrote: It's unfortunate that you didn't see the thread from a few weeks ago that discussed this exact topic of OST replacement. Agreed. :( It should get a section in the manual I think. Agreed. This file is at /O/0/LAST_ID (capital 'o' then

Re: [Lustre-discuss] write RPC congestion

2010-12-21 Thread Oleg Drokin
Hello! I guess I am a little bit late to the party, but I was just reading comments in bug 16900 and have this question I really need to ask. On Aug 23, 2010, at 10:58 PM, Jeremy Filizetti wrote: The larger RPCs from bug 16900 offered some significant performance when working over the WAN.

Re: [Lustre-discuss] write RPC congestion

2010-12-21 Thread Jeremy Filizetti
In the attachment I created that Andreas posted at https://bugzilla.lustre.org/attachment.cgi?id=31423 if you look at graph 1 and 2 they are both using larger than default max_rpcs_in_flight. I believe the data without the patch from bug 16900 had max_rpcs_in_flight=42. For the data with the

Re: [Lustre-discuss] write RPC congestion

2010-12-21 Thread Oleg Drokin
Hello! On Dec 22, 2010, at 12:43 AM, Jeremy Filizetti wrote: In the attachment I created that Andreas posted at https://bugzilla.lustre.org/attachment.cgi?id=31423 if you look at graph 1 and 2 they are both using larger than default max_rpcs_in_flight. I believe the data without the