[Lustre-discuss] Lustre DRBD failover time

2009-07-14 Thread tao.a.wu

Hi, all,

I am evaluating Lustre with DRBD failover, and experiencing about 2 minutes in 
OSS failover time to switch to the secondary node.  Has anyone have the similar 
observation (so that we can conclude this should be expected), or if there is 
some parameters that I should tune to reduce that time?

I have a simple setup: the MDS and OSS0 are hosted on server1, and OSS1 are 
hosted on server2.  OSS0 and OSS1 are the primary nodes for OST0 and OST1, 
respectively, and the OSTs are replicated using DRBD (protocol C) to the other 
machine.  The two OSTs are about 73GB each.  I am running Lustre 1.6 + DRBD 8 + 
Heartbeat v2 (but using v1 configuration).

From HA logs, it looks that Heartbeat noticed a node is down within 10 seconds 
(with is consistent with the deadtime of 6 seconds).  Where does the secondary 
node spend the remaining 100-110 seconds?  There was a post 
(http://groups.google.com/group/lustre-discuss-list/msg/bbbeac047df678ca?dmode=source)
 contributing MDS failover time to fsck.  Does it also cause my problem?

Thanks,

-Tao




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre DRBD failover time

2009-07-14 Thread Brian J. Murrell
On Tue, 2009-07-14 at 17:54 +0200, tao.a...@nokia.com wrote:
  
 Hi, all,
  
 I am evaluating Lustre with DRBD failover, and experiencing about 2
 minutes in OSS failover time to switch to the secondary node.

What is this 2 minutes including?  Just the time for the second OSS to
mount the disk and start recovery or is it 2 minutes to detect the
primary failure and have the secondary complete recovery so that the
clients are fully functional again?

If the latter, then you are doing quite well.  Recovery is not an
instantaneous process.  Much work needs to be done to ensure coherency
between what is on the disk of the failed over OST and what the clients
think is on disk.  Getting to this state requires that all clients
synchronize with the OST and getting/waiting for many clients to do this
can, currently, take many minutes as each client has to first notice the
primary is dead and sync up with the failover.  Some clients might not
even be available to sync, in which case you have to wait for a timeout.

So if you are talking 2 minutes from failure to full recovery, you are
not likely going to put much of a dent in this.

Lustre 1.8 has adaptive timeouts enabled and that should help in optimal
situations, but it will still take time to do a full recovery.

b.




signature.asc
Description: This is a digitally signed message part
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre DRBD failover time

2009-07-14 Thread Cliff White
tao.a...@nokia.com wrote:
  
 Hi, all,
  
 I am evaluating Lustre with DRBD failover, and experiencing about 2 
 minutes in OSS failover time to switch to the secondary node.  Has 
 anyone have the similar observation (so that we can conclude this should 
 be expected), or if there is some parameters that I should tune to 
 reduce that time?
  
 I have a simple setup: the MDS and OSS0 are hosted on server1, and OSS1 
 are hosted on server2.  OSS0 and OSS1 are the primary nodes for OST0 and 
 OST1, respectively, and the OSTs are replicated using DRBD (protocol C) 
 to the other machine.  The two OSTs are about 73GB each.  I am running 
 Lustre 1.6 + DRBD 8 + Heartbeat v2 (but using v1 configuration).
  
  From HA logs, it looks that Heartbeat noticed a node is down within 10 
 seconds (with is consistent with the deadtime of 6 seconds).  Where does 
 the secondary node spend the remaining 100-110 seconds?  There was a 
 post 
 (_http://groups.google.com/group/lustre-discuss-list/msg/bbbeac047df678ca?dmode=source_)
  
 contributing MDS failover time to fsck.  Does it also cause my problem?

as Brian mentioned, Lustre servers go through a recovery process.
You need to examine system logs on the OSS - if Lustre is in recovery, 
there will be messages in the logs explaining this.

cliffw



 Thanks,
  
 -Tao
  
  
  
  
 
 
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre DRBD failover time

2009-07-14 Thread tao.a.wu
Yes, it is the latter... Thanks for the info.

A related but different question,  Lustre 2.0 will have replication.  Under 2.0 
(with replication), what would happen if the primary node goes down?  Would the 
backup node be able to take over the load in shorter period of time?  Or is the 
replication feature for something else?

Thanks,

-Tao

-Original Message-
From: lustre-discuss-boun...@lists.lustre.org 
[mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of ext Brian J. 
Murrell
Sent: Tuesday, July 14, 2009 12:10 PM
To: lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] Lustre DRBD failover time

On Tue, 2009-07-14 at 17:54 +0200, tao.a...@nokia.com wrote:
  
 Hi, all,
  
 I am evaluating Lustre with DRBD failover, and experiencing about 2 
 minutes in OSS failover time to switch to the secondary node.

What is this 2 minutes including?  Just the time for the second OSS to mount 
the disk and start recovery or is it 2 minutes to detect the primary failure 
and have the secondary complete recovery so that the clients are fully 
functional again?

If the latter, then you are doing quite well.  Recovery is not an instantaneous 
process.  Much work needs to be done to ensure coherency between what is on the 
disk of the failed over OST and what the clients think is on disk.  Getting to 
this state requires that all clients synchronize with the OST and 
getting/waiting for many clients to do this can, currently, take many minutes 
as each client has to first notice the primary is dead and sync up with the 
failover.  Some clients might not even be available to sync, in which case you 
have to wait for a timeout.

So if you are talking 2 minutes from failure to full recovery, you are not 
likely going to put much of a dent in this.

Lustre 1.8 has adaptive timeouts enabled and that should help in optimal 
situations, but it will still take time to do a full recovery.

b.


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre DRBD failover time

2009-07-14 Thread Andreas Dilger
On Jul 14, 2009  21:05 +0200, tao.a...@nokia.com wrote:
 A related but different question,  Lustre 2.0 will have replication.
 Under 2.0 (with replication), what would happen if the primary node
 goes down?  Would the backup node be able to take over the load in
 shorter period of time?  Or is the replication feature for something else?

The replication feature has nothing to do with what you are thinking.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss