[Lustre-discuss] node rebooted giving following error messages.

2010-12-03 Thread akshar bhosale
Hi, one of our nodes got rebooted giving following error messages.. ov 26 14:27:41 yn70 pidof[25400]: can't read sid from /proc/25361/stat Nov 26 20:18:33 node70 kernel: LustreError: 1941:0:(file.c:2925:ll_inode_revalidate_fini()) failure -43 inode 174555137 Nov 26 20:23:34 node70 kernel: LustreErr

Re: [Lustre-discuss] OST error

2010-12-03 Thread Colin Faber
Hi Bob, Good to hear you've identified and resolved the issue. Sorry to hear you'll have to restore from backup though. -cf On 12/03/2010 02:41 PM, Bob Ball wrote: > Just to cleanly end this thread, the mptctl was out of date. We also > updated megaraid_sas and perc6 firmware. e2fsck found s

Re: [Lustre-discuss] OST error

2010-12-03 Thread Bob Ball
Just to cleanly end this thread, the mptctl was out of date. We also updated megaraid_sas and perc6 firmware. e2fsck found some Block bitmap differences (fixed) at this point, but the OST mounted cleanly and the errors stopped. Unfortunately, there are now corrupted files in the system, that

Re: [Lustre-discuss] lnet rounter immediatelly marked as down

2010-12-03 Thread Michael Kluge
Hi Liang, sure, but my current question is: Why are the nodes within o2ib considering the router as down? I add the route to a node within o2ib and instantly afterwards lctl show_route say the router is down. That does not make much sense to me. And if I try to send a message through the route

Re: [Lustre-discuss] target_send_reply_msg errors

2010-12-03 Thread David Dillow
On Thu, 2010-12-02 at 17:56 -0800, Andrus, Brian Contractor wrote: > I am seeing a TON of messages like: > > LustreError: 19122:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing > error (-107) r...@8105cbdd2c00 x1348234052426436/t0 o400->@:0/0 > lens 192/0 e 0 to 0 dl 1291340852 r

Re: [Lustre-discuss] lnet rounter immediatelly marked as down

2010-12-03 Thread liang Zhen
Hi Michael, To add router dynamically, you also have to run "--net o2ib add_route a.b@tcp1" on all nodes of tcp1, so the better choice is using universal modprobe.conf by define "ip2nets" and "routes", you can see some example at here: http://wiki.lustre.org/manual/LustreManual18_HTML/Mo

Re: [Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

2010-12-03 Thread Craig Prescott
Andreas Dilger wrote: > On 2010-12-02, at 09:24, Craig Prescott wrote: >> fsck seems to be spending a lot of time in Pass1D, cloning >> multiply-claimed blocks. But there is no output from fsck in many hours >> now, > > Pass 1b-1d have O(n^2) complexity, and require a second pass through all of

[Lustre-discuss] lnet rounter immediatelly marked as down

2010-12-03 Thread Michael Kluge
Hi list, we have a Lustr 1.6.7.2 running on our (IB SDR) cluster and have added one additional NIC (tcp1) to one node and like to use this node as router. I have added a ip2nets statement and forwaring=enabled to the modprobe files on the router and reloaded the modules. I see two NIDS now and no