Re: [Lustre-discuss] recovering formatted OST

2010-10-21 Thread Andreas Dilger
On 2010-10-21, at 18:44, Wojciech Turek wrote: > fsck has finished and does not find any more errors to correct. However when > I try to mount the device as ldiskfs kernel panics with following message: > > Assertion failure in cleanup_journal_tail() at fs/jbd/checkpoint.c:459: > "blocknr !=

Re: [Lustre-discuss] recovering formatted OST

2010-10-21 Thread Wojciech Turek
Hi, fsck has finished and does not find any more errors to correct. However when I try to mount the device as ldiskfs kernel panics with following message: Assertion failure in cleanup_journal_tail() at fs/jbd/checkpoint.c:459: "blocknr != 0" --- [cut here ] - [please bite here ]

Re: [Lustre-discuss] vanilla kernel with 2.0 version

2010-10-21 Thread jherold
Witam! W liście datowanym 20 października 2010 (17:57:34) napisano: > I assume your question is related to the server, since clients > generally work with vanilla kernels. > We are working on the RHEL6 2.6.32 for Lustre 2.1 (available in > bugzilla), and I'd hope that this will also work fairl

Re: [Lustre-discuss] recovering formatted OST

2010-10-21 Thread Wojciech Turek
Thanks Ken, that worked. On 21 October 2010 17:39, Ken Hornstein wrote: > >Now I have another problem. After last segfault I can not restart the fsck > >due to MMP. > >[...] > >Also when I try to access filesystem via debugfs it fails: > > > >debugfs -c -R 'ls' /dev/scratch2_ost16vg/ost16lv > >d

Re: [Lustre-discuss] recovering formatted OST

2010-10-21 Thread Wojciech Turek
Hi Bernd, Thanks for the tip. I don't have high hopes for recovering to much but from where I stand I have nothing to loose. Failed OST was a part of the scratch filesystem so in theory the data weren't that sensitive. However some people would be very happy if they could recover any data. Best r

Re: [Lustre-discuss] recovering formatted OST

2010-10-21 Thread Bernd Schubert
Hello Wojciech Turek, On Thursday, October 21, 2010, Wojciech Turek wrote: > Hi Andreas, > > I have restarted fsck after the segfault and it ran for several hours and > it segfaulted again. > > Pass 3A: Optimizing directories > Failed to optimize directory ??? (73031): EXT2 directory corrupted

Re: [Lustre-discuss] recovering formatted OST

2010-10-21 Thread Ken Hornstein
>Now I have another problem. After last segfault I can not restart the fsck >due to MMP. >[...] >Also when I try to access filesystem via debugfs it fails: > >debugfs -c -R 'ls' /dev/scratch2_ost16vg/ost16lv >debugfs 1.41.10.sun2 (24-Feb-2010) >/dev/scratch2_ost16vg/ost16lv: MMP: fsck being run whi

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Christopher J.Walker
Charles Taylor wrote: > On Oct 21, 2010, at 9:51 AM, Brock Palen wrote: > >> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote: >> >>> On 10/21/2010 09:37 AM, Brock Palen wrote: We recently added a new oss, it has 1 1Gb interface and 1 10Gb interface, The 10Gb interface is eth4 10

Re: [Lustre-discuss] recovering formatted OST

2010-10-21 Thread Wojciech Turek
Hi Andreas, I have restarted fsck after the segfault and it ran for several hours and it segfaulted again. Pass 3A: Optimizing directories Failed to optimize directory ??? (73031): EXT2 directory corrupted Failed to optimize directory ??? (73041): EXT2 directory corrupted Failed to optimize direc

Re: [Lustre-discuss] recovering formatted OST

2010-10-21 Thread Andreas Dilger
Having a bit more context would help see where the problem is. It may just be that with the other filesystems being formatted on top of the original that the filesystem is unrecoverable. E2fsck ran out of memory, but there shouldn't be a 2GB directory in the filesystem either, so it seems thi

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Lundgren, Andrew
Just as a FYI, you can set most of the bonding options in the ifcfg-bond0 file. IE: BONDING_OPTS="arp_ip_target=10.248.58.254 arp_interval=500 mode=active-backup primary=eth0" Then your modprobe.conf only needs: alias bond0 bonding -Original Message- From: lustre-discuss-boun...@lists

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Joe Landman
On 10/21/2010 10:29 AM, Brock Palen wrote: > > >> Why do you need both active? If one is a backup to the other, then >> bond them as a primary/backup pair, meaning only one will be active >> at at a time, ie, your designated primary (unless it goes down). > > We could do this, the 10Gb drivers hav

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Bob Ball
OK, quick startup on bonding, as we use it for our OSS here. We have 2 NICs we bond (SL5.5, an RHEL variant), eth1 at 1Gb and eth2 at 10Gb using Myricom hardware. 10.10.1.2 is the network gateway, a convenient arp target that should always be up. [r...@umdist04 network-scripts]# cat ifcfg-bond

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Brock Palen
On Oct 21, 2010, at 10:35 AM, Brian J. Murrell wrote: > On Thu, 2010-10-21 at 10:29 -0400, Brock Palen wrote: >> >> We could do this, the 10Gb drivers have been such a pain for us we wanted to >> have a 'back door' management network to get to the box should we have >> issues with the 10Gb dri

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Brian J. Murrell
On Thu, 2010-10-21 at 10:29 -0400, Brock Palen wrote: > > We could do this, the 10Gb drivers have been such a pain for us we wanted to > have a 'back door' management network to get to the box should we have issues > with the 10Gb driver. If you really do want two separate networks, one for Lu

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Wojciech Turek
Maybe I am missing a point here but can you explain me why would you need to have two NICs in one host on the same subnet? If you need additional access route to your host why not to configure eth0 on different subnet? On 21 October 2010 15:29, Brock Palen wrote: > > > > Why do you need both act

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Brock Palen
> Why do you need both active? If one is a backup to the other, then bond > them as a primary/backup pair, meaning only one will be active at at a > time, ie, your designated primary (unless it goes down). We could do this, the 10Gb drivers have been such a pain for us we wanted to have a 'b

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Charles Taylor
On Oct 21, 2010, at 9:51 AM, Brock Palen wrote: > On Oct 21, 2010, at 9:48 AM, Joe Landman wrote: > >> On 10/21/2010 09:37 AM, Brock Palen wrote: >>> We recently added a new oss, it has 1 1Gb interface and 1 10Gb >>> interface, >>> >>> The 10Gb interface is eth4 10.164.0.166 The 1Gb interface i

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Bob Ball
Why do you need both active? If one is a backup to the other, then bond them as a primary/backup pair, meaning only one will be active at at a time, ie, your designated primary (unless it goes down). bob On 10/21/2010 9:51 AM, Brock Palen wrote: > On Oct 21, 2010, at 9:48 AM, Joe Landman wrote

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Brock Palen
On Oct 21, 2010, at 9:48 AM, Joe Landman wrote: > On 10/21/2010 09:37 AM, Brock Palen wrote: >> We recently added a new oss, it has 1 1Gb interface and 1 10Gb >> interface, >> >> The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 >> 10.164.0.10 > > They look like they are on the

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Joe Landman
On 10/21/2010 09:37 AM, Brock Palen wrote: > We recently added a new oss, it has 1 1Gb interface and 1 10Gb > interface, > > The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 > 10.164.0.10 They look like they are on the same subnet if you are using /24 ... > > In modprobe.conf I

[Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Brock Palen
We recently added a new oss, it has 1 1Gb interface and 1 10Gb interface, The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 10.164.0.10 In modprobe.conf I have: options lnet networks=tcp0(eth4) lctl list_nids 10.164.0@tcp >From a host I run: lctl which_nid oss4 10.164.0

Re: [Lustre-discuss] recovering formatted OST

2010-10-21 Thread Wojciech Turek
Hi Andreas, I ran e2fsck -fy on recreated LVM but it segfaulted after running for sometime: ... Block #2098188 (938180923) causes directory to be too big. CLEARED. Error storing directory block information (inode=208387, block=0, num=261770): Memory allocation failed Recreate journal? yes Creat