Re: [Lustre-discuss] MGS and MDT on Failover Pair

2008-09-12 Thread Cliff White
Brian J. Murrell wrote:
 On Wed, 2008-09-10 at 16:23 -0400, Roger Spellman wrote:
 I am building a system with a redundant MDS, that is two MDS sharing a
 set of disks, one being Active, the other Standby.
  
 If I put the MGS and MDS on the same system, it appears that they must
 be on the same partition as well.
 
 No.
 
 Otherwise, when there is a failover, the MGS will not fail over.  Is
 that true?

If the MDT and MGT are separate partitions, then you will have to fail 
them over as separate services, as each partition will be mounted 
separately. The separate partitions can be on one system. Of course any 
decent HA tool will allow you to failover multiple services with one action.

Should note- the MGS is very small, and only used for configuration 
changes, and mount information. If all your clients are already mounted, 
an MGS failure is quite transparent - you can run for quite some time 
with a dead MGS.

For a very robust system, I would suggest moving the MGS to a small 
machine (heck, a cheap laptop would work for all but the biggest sites)
replicate the MGT disk and put your failover dollars on the MGS.

You could build a very robust failover MGS for the cost of two cheap
whitebox PC's (modulo network hardware cost). Also, a separate MGS is 
recomended when you have more than one filesystem.

cliffw


 
 Not true.
 
 b.
 
 
 
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] MGS and MDT on Failover Pair

2008-09-10 Thread Roger Spellman
I am building a system with a redundant MDS, that is two MDS sharing a
set of disks, one being Active, the other Standby.
 
If I put the MGS and MDS on the same system, it appears that they must
be on the same partition as well.  Otherwise, when there is a failover,
the MGS will not fail over.  Is that true?

Assuming that it is true, then I mount the MGS/MDT partition first, then
all the OSTs.
 
When I unmount, I usually want to unmount the MDT prior to unmounting
the OSTs, because the MDT is a client of the OSTs.  Is that possible
here?  In other words, I want to stop the MDT service first, then
unmount the OSTs, then unmount the MGS.  How can I stop the MDT service?

If I don't do this, that is, if I just unmount the OSTs followed by the
MGS/MDT, then I get errors like this:
 
Lustre: raid6-OST0002-osc: Connection to service raid6-OST0002 via nid
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]  was lost; in progress
operations using this service will wait for recovery to complete.
. . .
LustreError: 8964:0:(lov_obd.c:418:lov_disconnect_obd()) Target
raid6-OST_UUID disconnect error -5
LustreError: 8964:0:(lov_obd.c:418:lov_disconnect_obd()) Target
raid6-OST0002_UUID disconnect error -5
Lustre: Request x105 sent from raid6-OST0003-osc to NID [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]  5s ago has timed out (limit 5s).
. . .
Lustre: MGS has stopped.
Lustre: Mount still busy with 7 refs, waiting for 330 secs...


So, my question is:  Is there a way to stop just the MDT function PRIOR
to unmounting the OSTs?
 
Thanks.
 
-Roger
 
Roger Spellman
Staff Engineer
Terascala, Inc.
508-588-1501
www.terascala.com http://www.terascala.com/ 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss