Re: [Lustre-discuss] questions about an OST content

2010-11-10 Thread Andreas Dilger
On 2010-11-10, at 14:40, Bob Ball wrote: > Yes, this brought us back up (sorry, took us a while). Clients see the > system, and I can read and write files. But.. > > What have we lost by doing this? Can we now let it go and recover as usual? > What is the next step here? The abort_recov

Re: [Lustre-discuss] questions about an OST content

2010-11-10 Thread Bob Ball
Yes, this brought us back up (sorry, took us a while). Clients see the system, and I can read and write files. But.. What have we lost by doing this? Can we now let it go and recover as usual? What is the next step here? bob On 11/10/2010 3:00 PM, Andreas Dilger wrote: > On 2010-11-10,

Re: [Lustre-discuss] questions about an OST content

2010-11-10 Thread Andreas Dilger
On 2010-11-10, at 11:01, Bob Ball wrote: > Well, we ran 2 days, migrating files off OST, then this morning, the MDT > crashed. Could not get all clients reconnected before seeing another > kernel panic on the mdt. did an e2fsck of the mdt db and tried again. > crashed again, but this time the

Re: [Lustre-discuss] questions about an OST content

2010-11-10 Thread Bob Ball
If this helps, the console shows this stuff at the kernel panic, leaving out most of the addresses and offsets for this "retyping" bob :ptlrpc:ldlm_handle_enqueue :mds:mds_handle :lnet:lnet_match_blocked_msg :ptlrpc:lustre_msg_get_conn_cnt :ptlrpc:ptlrpc_server_handle_request __activate_task try

Re: [Lustre-discuss] questions about an OST content

2010-11-10 Thread Bob Ball
Well, we ran 2 days, migrating files off OST, then this morning, the MDT crashed. Could not get all clients reconnected before seeing another kernel panic on the mdt. did an e2fsck of the mdt db and tried again. crashed again, but this time the logged message is: 2010-11-10T12:40:26-05:00 lm

Re: [Lustre-discuss] questions about an OST content

2010-11-10 Thread Andreas Dilger
On 2010-11-09, at 03:07, Aurelien Degremont wrote: > Andreas Dilger a écrit : >>> Cold replace: >>> 1 - Empty your OST >>> 2 - Stop your filesystem >>> 3 - Replace/reformat using the same index >>> 4 - Restart using --writeconf >>> 5 - Remount the clients >> 6 - fix up the MDS's idea of the OST's l

Re: [Lustre-discuss] questions about an OST content

2010-11-09 Thread Aurelien Degremont
Hi Andreas Dilger a écrit : >> Cold replace: >> 1 - Empty your OST >> 2 - Stop your filesystem >> 3 - Replace/reformat using the same index >> 4 - Restart using --writeconf >> 5 - Remount the clients > 6 - fix up the MDS's idea of the OST's last-allocated object. > >> Did I miss something ? > >

Re: [Lustre-discuss] questions about an OST content

2010-11-08 Thread Andreas Dilger
On 2010-11-08, at 14:18, Aurélien Degrémont wrote: > Tell me if I'm wrong regarding this OST update. > AFAIK, there is two ways to replace an OST by a new one: > > Hot replace: > 1 - Disable your OST on MDT (lctl deactivate) > 2 - Empty your OST > 3 - Backup the magic files (last_rcvd, LAST_ID, CO

Re: [Lustre-discuss] questions about an OST content

2010-11-08 Thread Bob Ball
Yes, you are correct. That was the key here, did not put that file back in place. Back up and (so far) operating cleanly. Thanks, bob On 11/8/2010 3:04 PM, Andreas Dilger wrote: > On 2010-11-08, at 11:39, Bob Ball wrote: >> Don't know if I sent to the whole list. One of those days. >> >> rema

Re: [Lustre-discuss] questions about an OST content

2010-11-08 Thread Aurélien Degrémont
Le 08/11/2010 21:04, Andreas Dilger a écrit : > Looks like you didn't copy the old "CONFIGS/mountdata" file over the new one. > You can also use "--writeconf" (described in the manual and several times on > the list) to have the MGS re-generate the configuration, which should fix > this as well

Re: [Lustre-discuss] questions about an OST content

2010-11-08 Thread Andreas Dilger
On 2010-11-08, at 11:39, Bob Ball wrote: > Don't know if I sent to the whole list. One of those days. > > remade the raid device, remade the lustre fs on it, but the disks won't > mount. Error is below. How do I overcome this? > > mounting device /dev/sdc at /mnt/ost12, flags=0 options=device

Re: [Lustre-discuss] questions about an OST content

2010-11-08 Thread Bob Ball
Don't know if I sent to the whole list.  One of those days. remade the raid device, remade the lustre fs on it, but the disks won't mount.  Error is below.  How do I overcome this? Thanks, bob mounting device /dev/sdc at /mnt/ost12, flags=0 options

Re: [Lustre-discuss] questions about an OST content

2010-11-08 Thread Bob Ball
OK, made new raid, made file system with same index, but they won't mount.  This is the error.  What can I do here? bob mounting device /dev/sdc at /mnt/ost12, flags=0 options=device=/dev/sdc mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address alread

Re: [Lustre-discuss] questions about an OST content

2010-11-08 Thread Andreas Dilger
On 2010-11-07, at 12:32, Bob Ball wrote: > Tomorrow, we will redo all 8 OST on the first file server we are redoing. I > am very nervous about this, as a lot is riding on us doing this correctly. > For example, on a client now, if I umount one of the ost, without first > taking some (unknown

Re: [Lustre-discuss] questions about an OST content

2010-11-08 Thread Andreas Dilger
None if the Lustre config files stores the OST size, so it should be fine. Note that even if your OST isn't empty, you can just copy over all of the files into the newly-formatted filesystem, so long as you copy the xattrs with them. Cheers, Andreas On 2010-11-07, at 12:43, Bob Ball wrote:

Re: [Lustre-discuss] questions about an OST content

2010-11-07 Thread Bob Ball
Thanks, Ashley. No quotas, fortunately. Tomorrow will be "fun". bob On 11/7/2010 3:44 PM, Ashley Pittman wrote: > On 7 Nov 2010, at 19:32, Bob Ball wrote: >> So, while we are doing the reformat, is there any way to avoid this >> "hang" situation? > I believe there is but it escapes me at the mi

Re: [Lustre-discuss] questions about an OST content

2010-11-07 Thread Ashley Pittman
On 7 Nov 2010, at 19:32, Bob Ball wrote: > So, while we are doing the reformat, is there any way to avoid this > "hang" situation? I believe there is but it escapes me at the minute. > Is the --index=XX argument to mkfs.lustre hex, or decimal? Seems from > your comment below that this must be

Re: [Lustre-discuss] questions about an OST content

2010-11-07 Thread Bob Ball
BTW, the new OST sizes will be much different from the original OST sizes. Is the "copy the old file" method below still valid in this case? bob On 11/7/2010 2:32 PM, Bob Ball wrote: > Hi, Andreas. > > Tomorrow, we will redo all 8 OST on the first file server we are > redoing. I am very nervo

Re: [Lustre-discuss] questions about an OST content

2010-11-07 Thread Bob Ball
Hi, Andreas. Tomorrow, we will redo all 8 OST on the first file server we are redoing. I am very nervous about this, as a lot is riding on us doing this correctly. For example, on a client now, if I umount one of the ost, without first taking some (unknown to me) action on the MDT, then the

Re: [Lustre-discuss] questions about an OST content

2010-11-06 Thread Ashley Pittman
On 6 Nov 2010, at 19:28, Bob Ball wrote: > I intend to be VERY careful with this. Thank you all. Any further > advice before I do this, likely on Monday, will be greatly appreciated. I believe it is possible to use udev to assign device names to devices which would make the pathname both cons

Re: [Lustre-discuss] questions about an OST content

2010-11-06 Thread Bob Ball
Responding to everyone (and thanks to all) lfs df -i from a client or simply df -i from the OSS node ... This still shows of order 100 inodes after the OST was emptied. "tunefs.lustre --print /dev/sdj" will tell you the index in base 10. Yes, this worked. df -H /path/to/OST/mount_point This

Re: [Lustre-discuss] questions about an OST content

2010-11-06 Thread Andreas Dilger
On 2010-11-06, at 8:24, Bob Ball wrote: > I am emptying a set of OST so that I can reformat the underlying RAID-6 > more efficiently. Two questions: > 1. Is there a quick way to tell if the OST is really empty? lfs_find > takes many hours to run. If you mount the OST as type ldiskfs and look

Re: [Lustre-discuss] questions about an OST content

2010-11-06 Thread Joe Landman
On 11/06/2010 10:24 AM, Bob Ball wrote: > I am emptying a set of OST so that I can reformat the underlying RAID-6 > more efficiently. Two questions: > 1. Is there a quick way to tell if the OST is really empty? lfs_find > takes many hours to run. Yes ... df -H /path/to/OST/mount_point

Re: [Lustre-discuss] questions about an OST content

2010-11-06 Thread Ashley Pittman
On 6 Nov 2010, at 14:24, Bob Ball wrote: > I am emptying a set of OST so that I can reformat the underlying RAID-6 > more efficiently. Two questions: > 1. Is there a quick way to tell if the OST is really empty? lfs_find > takes many hours to run. lfs df -i from a client or simply df -i from

[Lustre-discuss] questions about an OST content

2010-11-06 Thread Bob Ball
I am emptying a set of OST so that I can reformat the underlying RAID-6 more efficiently. Two questions: 1. Is there a quick way to tell if the OST is really empty? lfs_find takes many hours to run. 2. When I reformat, I want it to retain the same ID so as to not make "holes" in the list. Fro