Re: [Lustre-discuss] how to reuse OST indices (EADDRINUSE)

2010-12-21 Thread Andreas Dilger
On 2010-12-21, at 8:58, Charles Taylor tay...@hpc.ufl.edu wrote:
 So we are evacuating all the OSTs, replacing  
 the Areca 1680ix cards with Adaptec 51645s, re-initializing the LUNs,  
 reformatting the LUNs as OSTs (using the same OST index as before) and  
 remounting them.That is the plan anyway.

It's unfortunate that you didn't see the thread from a few weeks ago that 
discussed this exact topic of OST replacement. It should get a section in the 
manual I think. 

 We've already reformatted (mkfs.lustre) one set of 6 OSTs and did not  
 save the magic files and so are getting the Address in Use error  
 for those OSTs.   That being the case, I assume we must
 
 1. Unmount the file system from all clients
 2. Unmount the OSTs
 3 Unmount the MDT
 4. tunefs.lustre --writeconf /dev/mdt
 5. remount the MDT
 6. remount the OSTs (including the reformatted ones)
 7. remount the file system on clients.
 
 1. Is this the correct sequence?
 2. Will this leave all our data intact?
 3. Must we do a writeconf on the OSTs too or just the MDT?

There is actually only a flag on the OSTs that needs to be changed to have it 
stop trying to register with the MGS, and just pretend to be the OST index that 
you formatted it as. 

The flag is in the binary CONFIGS/mountdata file (struct lustre_disk_data) in 
the ldd_flags field (offset 20). It should only have the LDD_F_SV_TYPE_OST (2) 
set. 

 Also, for the remaining OSTs we will save the magic files and  
 restore them after reformatting which should eliminate the need for  
 the procedure above.With some of the OSTs mounted as ldiskfs, I  
 see the last_rcvd file and the CONFIG directory but no LAST_ID  
 file. Should the LAST_ID file be there?

This file is at /O/0/LAST_ID (capital 'o' then zero) and should be copied for 
OSTs you haven't replaced yet, along with the other files.  It can be recreated 
with a binary editor from the value on the MDS (lctl get_param 
osc.*.prealloc_next_id) for the 6 OSTs that have already been replaced. Search 
the list or bugzilla for LAST_ID for a detailed procedure. 

 On Dec 20, 2010, at 11:18 PM, Wang Yibin wrote:
 
 Hello,
 
 Did you backup old magic files (last_rcvd, LAST_ID, CONFIG/*) from  
 the original OSTs and put them back before trying to mount them?
 You probably didn't do that. So when you remount the OSTs with  
 existing index, the MGS will refuse to add them without being told  
 to writeconf, hence -EADDRINUSE.
 The proper ways to replace an OST are described in bug 24128.
 
 在 2010-12-21,上午8:33, Craig Prescott 写道:
 
 
 Hello list,
 
 We recently evacuated several OSTs on a single OSS, replaced RAID
 controllers, re-initialized RAIDs for new OSTs, and made new lustre
 filesystems for them, using the same OST indices as we had before.
 
 The filesystem and all its clients have been up and running the whole
 time.  We disabled the OSTs we were working on on all clients and our
 MGS/MDS (lctl dl shows them as IN everywhere).
 
 Now we want to bring the newly-formatted OSTs back online.  When we  
 try
 to mount the new OSTs, we get this for each one in this syslog of  
 the
 OSS that has been under maintenance:
 
 Lustre: mgc10.13.28@o2ib: Reactivating import
 LustreError: 11-0: an error occurred while communicating with  
 10.13.28@o2ib. The mgs_target_reg operation failed with -98
 LustreError: 6065:0:(obd_mount.c:1097:server_start_targets())  
 Required registration failed for cms-OST0006: -98
 LustreError: 6065:0:(obd_mount.c:1655:server_fill_super()) Unable  
 to start targets: -98
 LustreError: 6065:0:(obd_mount.c:1438:server_put_super()) no obd  
 cms-OST0006
 LustreError: 6065:0:(obd_mount.c:147:server_deregister_mount())  
 cms-OST0006 not registered
 
 What do we need to do to get these OSTs back into the filesystem?
 
 We really want to reuse the original indices.
 
 This is Lustre 1.8.4, btw.
 
 Thanks,
 Craig Prescott
 UF HPC Center
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] how to reuse OST indices (EADDRINUSE)

2010-12-21 Thread Charles Taylor

On Dec 21, 2010, at 12:39 PM, Andreas Dilger wrote:

 It's unfortunate that you didn't see the thread from a few weeks ago  
 that discussed this exact topic of OST replacement.

Agreed.  :(

 It should get a section in the manual I think.

Agreed.

 This file is at /O/0/LAST_ID (capital 'o' then zero) and should be  
 copied for OSTs you haven't replaced yet, along with the other  
 files.  It can be recreated with a binary editor from the value on  
 the MDS (lctl get_param osc.*.prealloc_next_id) for the 6 OSTs that  
 have already been replaced. Search the list or bugzilla for  
 LAST_ID for a detailed procedure.

This seems to do the trick.   Thank you!.One important  
clarification though...on the mds, should we getting the value of  
prealloc_next_id or prealloc_last_id?Section 23.3.9 of the 2.0 Ops  
manual for How to fix a Bad LAST_ID on an OST seems to use  
prealloc_last_id.Which should we be using?

Thank you again,

Charlie Taylor
UF HPC Center


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] how to reuse OST indices (EADDRINUSE)

2010-12-20 Thread Wang Yibin
Hello,

Did you backup old magic files (last_rcvd, LAST_ID, CONFIG/*) from the original 
OSTs and put them back before trying to mount them?
You probably didn't do that. So when you remount the OSTs with existing index, 
the MGS will refuse to add them without being told to writeconf, hence 
-EADDRINUSE.
The proper ways to replace an OST are described in bug 24128.

在 2010-12-21,上午8:33, Craig Prescott 写道:

 
 Hello list,
 
 We recently evacuated several OSTs on a single OSS, replaced RAID 
 controllers, re-initialized RAIDs for new OSTs, and made new lustre 
 filesystems for them, using the same OST indices as we had before.
 
 The filesystem and all its clients have been up and running the whole 
 time.  We disabled the OSTs we were working on on all clients and our 
 MGS/MDS (lctl dl shows them as IN everywhere).
 
 Now we want to bring the newly-formatted OSTs back online.  When we try 
 to mount the new OSTs, we get this for each one in this syslog of the 
 OSS that has been under maintenance:
 
 Lustre: mgc10.13.28@o2ib: Reactivating import
 LustreError: 11-0: an error occurred while communicating with 
 10.13.28@o2ib. The mgs_target_reg operation failed with -98
 LustreError: 6065:0:(obd_mount.c:1097:server_start_targets()) Required 
 registration failed for cms-OST0006: -98
 LustreError: 6065:0:(obd_mount.c:1655:server_fill_super()) Unable to start 
 targets: -98
 LustreError: 6065:0:(obd_mount.c:1438:server_put_super()) no obd cms-OST0006
 LustreError: 6065:0:(obd_mount.c:147:server_deregister_mount()) cms-OST0006 
 not registered
 
 What do we need to do to get these OSTs back into the filesystem?
 
 We really want to reuse the original indices.
 
 This is Lustre 1.8.4, btw.
 
 Thanks,
 Craig Prescott
 UF HPC Center
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss