Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

2010-10-05 Thread renayama19661014
Hi Andrew,

I registered these contents with Bugzilla as enhancement of the functions.

 * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2501

Thanks,
Hideo Yamauchi.


--- renayama19661...@ybb.ne.jp wrote:

 Hi Andrew,
 
 Thank you for comment.
 
   Is the change of this attrd and crmd difficult?
  
  I dont think so.
  But its not a huge priority because I've never heard of attrd actually 
  crashing.
  
  So while I agree that its theoretically a problem, in practice no-one
  is going to hit this in production.
  Even if they were unlucky enough to see it, at worst the resource is
  able to run on the node again - which doesn't seem that bad for a HA
  cluster :-)
 
 
 All right.
 
 I register this problem with Bugzilla as a demand first of all. 
 I will wait for the opinion from other users already appearing a little.
 
 Thanks,
 Hideo Yamauchi.
 
 --- Andrew Beekhof and...@beekhof.net wrote:
 
  On Fri, Oct 1, 2010 at 4:00 AM,  renayama19661...@ybb.ne.jp wrote:
   Hi Andrew,
  
   Thank you for comment.
  
   During crmd startup, one could read all the values from attrd into the
   hashtable.
   So the hashtable would only do something if only attrd went down.
  
   If attrd communicates with crmd at the time of start and reads the data 
   of the hash table,
 the
  problem
   seems to be able to be settled.
  
   Is the change of this attrd and crmd difficult?
  
  I dont think so.
  But its not a huge priority because I've never heard of attrd actually 
  crashing.
  
  So while I agree that its theoretically a problem, in practice no-one
  is going to hit this in production.
  Even if they were unlucky enough to see it, at worst the resource is
  able to run on the node again - which doesn't seem that bad for a HA
  cluster :-)
  
  
  
   I mean: did you see this behavior in a production system, or only
   during testing when you manually killed attrd?
  
   We carry out kill-command by manual operation as one of the tests of the 
   trouble of the
  processes.
   Our user minds behavior of the process trouble very much.
  
   Best Regards,
   Hideo Yamauchi.
  
   --- Andrew Beekhof and...@beekhof.net wrote:
  
   On Wed, Sep 29, 2010 at 3:59 AM, nbsp;renayama19661...@ybb.ne.jp 
   wrote:
Hi Andrew,
   
Thank you for comment.
   
The problem here is that attrd is supposed to be the authoritative
source for this sort of data.
   
Yes. I understand.
   
Additionally, you don't always want attrd reading from the status
section - like after the cluster restarts.
   
The problem seems to be able to solve even that it retrieves a status 
section from cib
  after
   attrd
rebooted.
method2 which I suggested is such a meaning.
 method 2)When attrd started, Attrd communicates with cib and 
 receives fail-count.
   
For failcount, the crmd could keep a hashtable of the current values
which it could re-send to attrd if it detects a disconnection.
But that might not be a generic-enough solution.
   
If a Hash table of crmd can maintain it, it may be a good thought.
However, I have a feeling that the same problem happens when crmd 
causes trouble and
  rebooted.
  
   During crmd startup, one could read all the values from attrd into the
   hashtable.
   So the hashtable would only do something if only attrd went down.
  
   
The chance that attrd dies _and_ there were relevant values for
fail-count is pretty remote though... is this a real problem you've
experienced or a theoretical one?
   
I did not understand meanings well.
Does this mean that there is fail-count of attrd in the other node?
  
   I mean: did you see this behavior in a production system, or only
   during testing when you manually killed attrd?
  
   
Best Regards,
Hideo Yamauchi.
   
--- Andrew Beekhof and...@beekhof.net wrote:
   
On Mon, Sep 27, 2010 at 7:26 AM, #65533;renayama19661...@ybb.ne.jp 
wrote:
 Hi,

 When I investigated another problem, I discovered this phenomenon.
 If attrd causes process trouble and does not restart, the problem 
 does not occur.

 Step1) After start, it causes a monitor error in UmIPaddr twice.

 Online: [ srv01 srv02 ]

 #65533;Resource Group: UMgroup01
 #65533; #65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
 #65533; #65533; UmIPaddr #65533; (ocf::heartbeat:Dummy2): 
 #65533; #65533;
 #65533;
#65533;Started srv01

 Migration summary:
 * Node srv02:
 * Node srv01:
 #65533; UmIPaddr: migration-threshold=10 fail-count=2

 Step2) Kill Attrd and Attrd reboots.

 Online: [ srv01 srv02 ]

 #65533;Resource Group: UMgroup01
 #65533; #65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
 #65533; #65533; UmIPaddr #65533; (ocf::heartbeat:Dummy2): 
 #65533; #65533;
 #65533;
#65533;Started srv01

 Migration summary:
 * Node srv02:
 * Node srv01:
 

Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

2010-10-04 Thread renayama19661014
Hi Andrew,

Thank you for comment.

  Is the change of this attrd and crmd difficult?
 
 I dont think so.
 But its not a huge priority because I've never heard of attrd actually 
 crashing.
 
 So while I agree that its theoretically a problem, in practice no-one
 is going to hit this in production.
 Even if they were unlucky enough to see it, at worst the resource is
 able to run on the node again - which doesn't seem that bad for a HA
 cluster :-)


All right.

I register this problem with Bugzilla as a demand first of all. 
I will wait for the opinion from other users already appearing a little.

Thanks,
Hideo Yamauchi.

--- Andrew Beekhof and...@beekhof.net wrote:

 On Fri, Oct 1, 2010 at 4:00 AM,  renayama19661...@ybb.ne.jp wrote:
  Hi Andrew,
 
  Thank you for comment.
 
  During crmd startup, one could read all the values from attrd into the
  hashtable.
  So the hashtable would only do something if only attrd went down.
 
  If attrd communicates with crmd at the time of start and reads the data of 
  the hash table, the
 problem
  seems to be able to be settled.
 
  Is the change of this attrd and crmd difficult?
 
 I dont think so.
 But its not a huge priority because I've never heard of attrd actually 
 crashing.
 
 So while I agree that its theoretically a problem, in practice no-one
 is going to hit this in production.
 Even if they were unlucky enough to see it, at worst the resource is
 able to run on the node again - which doesn't seem that bad for a HA
 cluster :-)
 
 
 
  I mean: did you see this behavior in a production system, or only
  during testing when you manually killed attrd?
 
  We carry out kill-command by manual operation as one of the tests of the 
  trouble of the
 processes.
  Our user minds behavior of the process trouble very much.
 
  Best Regards,
  Hideo Yamauchi.
 
  --- Andrew Beekhof and...@beekhof.net wrote:
 
  On Wed, Sep 29, 2010 at 3:59 AM, #160;renayama19661...@ybb.ne.jp wrote:
   Hi Andrew,
  
   Thank you for comment.
  
   The problem here is that attrd is supposed to be the authoritative
   source for this sort of data.
  
   Yes. I understand.
  
   Additionally, you don't always want attrd reading from the status
   section - like after the cluster restarts.
  
   The problem seems to be able to solve even that it retrieves a status 
   section from cib
 after
  attrd
   rebooted.
   method2 which I suggested is such a meaning.
method 2)When attrd started, Attrd communicates with cib and receives 
fail-count.
  
   For failcount, the crmd could keep a hashtable of the current values
   which it could re-send to attrd if it detects a disconnection.
   But that might not be a generic-enough solution.
  
   If a Hash table of crmd can maintain it, it may be a good thought.
   However, I have a feeling that the same problem happens when crmd causes 
   trouble and
 rebooted.
 
  During crmd startup, one could read all the values from attrd into the
  hashtable.
  So the hashtable would only do something if only attrd went down.
 
  
   The chance that attrd dies _and_ there were relevant values for
   fail-count is pretty remote though... is this a real problem you've
   experienced or a theoretical one?
  
   I did not understand meanings well.
   Does this mean that there is fail-count of attrd in the other node?
 
  I mean: did you see this behavior in a production system, or only
  during testing when you manually killed attrd?
 
  
   Best Regards,
   Hideo Yamauchi.
  
   --- Andrew Beekhof and...@beekhof.net wrote:
  
   On Mon, Sep 27, 2010 at 7:26 AM, #65533;renayama19661...@ybb.ne.jp 
   wrote:
Hi,
   
When I investigated another problem, I discovered this phenomenon.
If attrd causes process trouble and does not restart, the problem 
does not occur.
   
Step1) After start, it causes a monitor error in UmIPaddr twice.
   
Online: [ srv01 srv02 ]
   
#65533;Resource Group: UMgroup01
#65533; #65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
#65533; #65533; UmIPaddr #65533; (ocf::heartbeat:Dummy2): #65533; 
#65533; #65533;
   #65533;Started srv01
   
Migration summary:
* Node srv02:
* Node srv01:
#65533; UmIPaddr: migration-threshold=10 fail-count=2
   
Step2) Kill Attrd and Attrd reboots.
   
Online: [ srv01 srv02 ]
   
#65533;Resource Group: UMgroup01
#65533; #65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
#65533; #65533; UmIPaddr #65533; (ocf::heartbeat:Dummy2): #65533; 
#65533; #65533;
   #65533;Started srv01
   
Migration summary:
* Node srv02:
* Node srv01:
#65533; UmIPaddr: migration-threshold=10 fail-count=2
   
Step3) It causes a monitor error in UmIPaddr.
   
Online: [ srv01 srv02 ]
   
#65533;Resource Group: UMgroup01
#65533; #65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
#65533; #65533; UmIPaddr #65533; (ocf::heartbeat:Dummy2): #65533; 
#65533; #65533;
   #65533;Started srv01
   

Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

2010-10-01 Thread Andrew Beekhof
On Fri, Oct 1, 2010 at 4:00 AM,  renayama19661...@ybb.ne.jp wrote:
 Hi Andrew,

 Thank you for comment.

 During crmd startup, one could read all the values from attrd into the
 hashtable.
 So the hashtable would only do something if only attrd went down.

 If attrd communicates with crmd at the time of start and reads the data of 
 the hash table, the problem
 seems to be able to be settled.

 Is the change of this attrd and crmd difficult?

I dont think so.
But its not a huge priority because I've never heard of attrd actually crashing.

So while I agree that its theoretically a problem, in practice no-one
is going to hit this in production.
Even if they were unlucky enough to see it, at worst the resource is
able to run on the node again - which doesn't seem that bad for a HA
cluster :-)



 I mean: did you see this behavior in a production system, or only
 during testing when you manually killed attrd?

 We carry out kill-command by manual operation as one of the tests of the 
 trouble of the processes.
 Our user minds behavior of the process trouble very much.

 Best Regards,
 Hideo Yamauchi.

 --- Andrew Beekhof and...@beekhof.net wrote:

 On Wed, Sep 29, 2010 at 3:59 AM,  renayama19661...@ybb.ne.jp wrote:
  Hi Andrew,
 
  Thank you for comment.
 
  The problem here is that attrd is supposed to be the authoritative
  source for this sort of data.
 
  Yes. I understand.
 
  Additionally, you don't always want attrd reading from the status
  section - like after the cluster restarts.
 
  The problem seems to be able to solve even that it retrieves a status 
  section from cib after
 attrd
  rebooted.
  method2 which I suggested is such a meaning.
   method 2)When attrd started, Attrd communicates with cib and receives 
   fail-count.
 
  For failcount, the crmd could keep a hashtable of the current values
  which it could re-send to attrd if it detects a disconnection.
  But that might not be a generic-enough solution.
 
  If a Hash table of crmd can maintain it, it may be a good thought.
  However, I have a feeling that the same problem happens when crmd causes 
  trouble and rebooted.

 During crmd startup, one could read all the values from attrd into the
 hashtable.
 So the hashtable would only do something if only attrd went down.

 
  The chance that attrd dies _and_ there were relevant values for
  fail-count is pretty remote though... is this a real problem you've
  experienced or a theoretical one?
 
  I did not understand meanings well.
  Does this mean that there is fail-count of attrd in the other node?

 I mean: did you see this behavior in a production system, or only
 during testing when you manually killed attrd?

 
  Best Regards,
  Hideo Yamauchi.
 
  --- Andrew Beekhof and...@beekhof.net wrote:
 
  On Mon, Sep 27, 2010 at 7:26 AM, #65533;renayama19661...@ybb.ne.jp 
  wrote:
   Hi,
  
   When I investigated another problem, I discovered this phenomenon.
   If attrd causes process trouble and does not restart, the problem does 
   not occur.
  
   Step1) After start, it causes a monitor error in UmIPaddr twice.
  
   Online: [ srv01 srv02 ]
  
   #65533;Resource Group: UMgroup01
   #65533; #65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
   #65533; #65533; UmIPaddr #65533; (ocf::heartbeat:Dummy2): #65533; 
   #65533; #65533;
  #65533;Started srv01
  
   Migration summary:
   * Node srv02:
   * Node srv01:
   #65533; UmIPaddr: migration-threshold=10 fail-count=2
  
   Step2) Kill Attrd and Attrd reboots.
  
   Online: [ srv01 srv02 ]
  
   #65533;Resource Group: UMgroup01
   #65533; #65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
   #65533; #65533; UmIPaddr #65533; (ocf::heartbeat:Dummy2): #65533; 
   #65533; #65533;
  #65533;Started srv01
  
   Migration summary:
   * Node srv02:
   * Node srv01:
   #65533; UmIPaddr: migration-threshold=10 fail-count=2
  
   Step3) It causes a monitor error in UmIPaddr.
  
   Online: [ srv01 srv02 ]
  
   #65533;Resource Group: UMgroup01
   #65533; #65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
   #65533; #65533; UmIPaddr #65533; (ocf::heartbeat:Dummy2): #65533; 
   #65533; #65533;
  #65533;Started srv01
  
   Migration summary:
   * Node srv02:
   * Node srv01:
   #65533; UmIPaddr: migration-threshold=10 fail-count=1 - 
   Fail-count return to the
 first.
  
   The problem is so that attrd disappears fail-count by 
   reboot.(Hash-tables is Lost.)
   It is a problem very much that the trouble number of times is 
   initialized.
  
   I think that there is the following method.
  
   method 1)Attrd maintain fail-count as a file in /var/run directories 
   and refer.
  
   method 2)When attrd started, Attrd communicates with cib and receives 
   fail-count.
  
   Is there a better method?
  
   Please think about the solution of this problem.
 
  H... a tricky one.
 
  The problem here is that attrd is supposed to be the authoritative
  source for this sort of data.
  Additionally, you don't always want attrd reading 

Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

2010-09-30 Thread renayama19661014
Hi Andrew,

Thank you for comment.

 During crmd startup, one could read all the values from attrd into the
 hashtable.
 So the hashtable would only do something if only attrd went down.

If attrd communicates with crmd at the time of start and reads the data of the 
hash table, the problem
seems to be able to be settled.

Is the change of this attrd and crmd difficult?


 I mean: did you see this behavior in a production system, or only
 during testing when you manually killed attrd?

We carry out kill-command by manual operation as one of the tests of the 
trouble of the processes.
Our user minds behavior of the process trouble very much.

Best Regards,
Hideo Yamauchi.

--- Andrew Beekhof and...@beekhof.net wrote:

 On Wed, Sep 29, 2010 at 3:59 AM,  renayama19661...@ybb.ne.jp wrote:
  Hi Andrew,
 
  Thank you for comment.
 
  The problem here is that attrd is supposed to be the authoritative
  source for this sort of data.
 
  Yes. I understand.
 
  Additionally, you don't always want attrd reading from the status
  section - like after the cluster restarts.
 
  The problem seems to be able to solve even that it retrieves a status 
  section from cib after
 attrd
  rebooted.
  method2 which I suggested is such a meaning.
   method 2)When attrd started, Attrd communicates with cib and receives 
   fail-count.
 
  For failcount, the crmd could keep a hashtable of the current values
  which it could re-send to attrd if it detects a disconnection.
  But that might not be a generic-enough solution.
 
  If a Hash table of crmd can maintain it, it may be a good thought.
  However, I have a feeling that the same problem happens when crmd causes 
  trouble and rebooted.
 
 During crmd startup, one could read all the values from attrd into the
 hashtable.
 So the hashtable would only do something if only attrd went down.
 
 
  The chance that attrd dies _and_ there were relevant values for
  fail-count is pretty remote though... is this a real problem you've
  experienced or a theoretical one?
 
  I did not understand meanings well.
  Does this mean that there is fail-count of attrd in the other node?
 
 I mean: did you see this behavior in a production system, or only
 during testing when you manually killed attrd?
 
 
  Best Regards,
  Hideo Yamauchi.
 
  --- Andrew Beekhof and...@beekhof.net wrote:
 
  On Mon, Sep 27, 2010 at 7:26 AM, #65533;renayama19661...@ybb.ne.jp 
  wrote:
   Hi,
  
   When I investigated another problem, I discovered this phenomenon.
   If attrd causes process trouble and does not restart, the problem does 
   not occur.
  
   Step1) After start, it causes a monitor error in UmIPaddr twice.
  
   Online: [ srv01 srv02 ]
  
   #65533;Resource Group: UMgroup01
   #65533; #65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
   #65533; #65533; UmIPaddr #65533; (ocf::heartbeat:Dummy2): #65533; 
   #65533; #65533;
  #65533;Started srv01
  
   Migration summary:
   * Node srv02:
   * Node srv01:
   #65533; UmIPaddr: migration-threshold=10 fail-count=2
  
   Step2) Kill Attrd and Attrd reboots.
  
   Online: [ srv01 srv02 ]
  
   #65533;Resource Group: UMgroup01
   #65533; #65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
   #65533; #65533; UmIPaddr #65533; (ocf::heartbeat:Dummy2): #65533; 
   #65533; #65533;
  #65533;Started srv01
  
   Migration summary:
   * Node srv02:
   * Node srv01:
   #65533; UmIPaddr: migration-threshold=10 fail-count=2
  
   Step3) It causes a monitor error in UmIPaddr.
  
   Online: [ srv01 srv02 ]
  
   #65533;Resource Group: UMgroup01
   #65533; #65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
   #65533; #65533; UmIPaddr #65533; (ocf::heartbeat:Dummy2): #65533; 
   #65533; #65533;
  #65533;Started srv01
  
   Migration summary:
   * Node srv02:
   * Node srv01:
   #65533; UmIPaddr: migration-threshold=10 fail-count=1 - Fail-count 
   return to the
 first.
  
   The problem is so that attrd disappears fail-count by 
   reboot.(Hash-tables is Lost.)
   It is a problem very much that the trouble number of times is 
   initialized.
  
   I think that there is the following method.
  
   method 1)Attrd maintain fail-count as a file in /var/run directories 
   and refer.
  
   method 2)When attrd started, Attrd communicates with cib and receives 
   fail-count.
  
   Is there a better method?
  
   Please think about the solution of this problem.
 
  H... a tricky one.
 
  The problem here is that attrd is supposed to be the authoritative
  source for this sort of data.
  Additionally, you don't always want attrd reading from the status
  section - like after the cluster restarts.
 
  For failcount, the crmd could keep a hashtable of the current values
  which it could re-send to attrd if it detects a disconnection.
  But that might not be a generic-enough solution.
 
  The chance that attrd dies _and_ there were relevant values for
  fail-count is pretty remote though... is this a real problem you've
  experienced or a theoretical one?
 
  

Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

2010-09-28 Thread Andrew Beekhof
On Mon, Sep 27, 2010 at 7:26 AM,  renayama19661...@ybb.ne.jp wrote:
 Hi,

 When I investigated another problem, I discovered this phenomenon.
 If attrd causes process trouble and does not restart, the problem does not 
 occur.

 Step1) After start, it causes a monitor error in UmIPaddr twice.

 Online: [ srv01 srv02 ]

  Resource Group: UMgroup01
     UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
     UmIPaddr   (ocf::heartbeat:Dummy2):        Started srv01

 Migration summary:
 * Node srv02:
 * Node srv01:
   UmIPaddr: migration-threshold=10 fail-count=2

 Step2) Kill Attrd and Attrd reboots.

 Online: [ srv01 srv02 ]

  Resource Group: UMgroup01
     UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
     UmIPaddr   (ocf::heartbeat:Dummy2):        Started srv01

 Migration summary:
 * Node srv02:
 * Node srv01:
   UmIPaddr: migration-threshold=10 fail-count=2

 Step3) It causes a monitor error in UmIPaddr.

 Online: [ srv01 srv02 ]

  Resource Group: UMgroup01
     UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
     UmIPaddr   (ocf::heartbeat:Dummy2):        Started srv01

 Migration summary:
 * Node srv02:
 * Node srv01:
   UmIPaddr: migration-threshold=10 fail-count=1 - Fail-count return to 
 the first.

 The problem is so that attrd disappears fail-count by reboot.(Hash-tables is 
 Lost.)
 It is a problem very much that the trouble number of times is initialized.

 I think that there is the following method.

 method 1)Attrd maintain fail-count as a file in /var/run directories and 
 refer.

 method 2)When attrd started, Attrd communicates with cib and receives 
 fail-count.

 Is there a better method?

 Please think about the solution of this problem.

H... a tricky one.

The problem here is that attrd is supposed to be the authoritative
source for this sort of data.
Additionally, you don't always want attrd reading from the status
section - like after the cluster restarts.

For failcount, the crmd could keep a hashtable of the current values
which it could re-send to attrd if it detects a disconnection.
But that might not be a generic-enough solution.

The chance that attrd dies _and_ there were relevant values for
fail-count is pretty remote though... is this a real problem you've
experienced or a theoretical one?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

2010-09-28 Thread renayama19661014
Hi Andrew,

Thank you for comment.

 The problem here is that attrd is supposed to be the authoritative
 source for this sort of data.

Yes. I understand.

 Additionally, you don't always want attrd reading from the status
 section - like after the cluster restarts.

The problem seems to be able to solve even that it retrieves a status section 
from cib after attrd
rebooted. 
method2 which I suggested is such a meaning.
  method 2)When attrd started, Attrd communicates with cib and receives 
  fail-count.

 For failcount, the crmd could keep a hashtable of the current values
 which it could re-send to attrd if it detects a disconnection.
 But that might not be a generic-enough solution.

If a Hash table of crmd can maintain it, it may be a good thought. 
However, I have a feeling that the same problem happens when crmd causes 
trouble and rebooted.

 The chance that attrd dies _and_ there were relevant values for
 fail-count is pretty remote though... is this a real problem you've
 experienced or a theoretical one?

I did not understand meanings well.
Does this mean that there is fail-count of attrd in the other node?

Best Regards,
Hideo Yamauchi.

--- Andrew Beekhof and...@beekhof.net wrote:

 On Mon, Sep 27, 2010 at 7:26 AM,  renayama19661...@ybb.ne.jp wrote:
  Hi,
 
  When I investigated another problem, I discovered this phenomenon.
  If attrd causes process trouble and does not restart, the problem does not 
  occur.
 
  Step1) After start, it causes a monitor error in UmIPaddr twice.
 
  Online: [ srv01 srv02 ]
 
  #65533;Resource Group: UMgroup01
  #65533; #65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
  #65533; #65533; UmIPaddr #65533; (ocf::heartbeat:Dummy2): #65533; 
  #65533; #65533;
#65533;Started srv01
 
  Migration summary:
  * Node srv02:
  * Node srv01:
  #65533; UmIPaddr: migration-threshold=10 fail-count=2
 
  Step2) Kill Attrd and Attrd reboots.
 
  Online: [ srv01 srv02 ]
 
  #65533;Resource Group: UMgroup01
  #65533; #65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
  #65533; #65533; UmIPaddr #65533; (ocf::heartbeat:Dummy2): #65533; 
  #65533; #65533;
#65533;Started srv01
 
  Migration summary:
  * Node srv02:
  * Node srv01:
  #65533; UmIPaddr: migration-threshold=10 fail-count=2
 
  Step3) It causes a monitor error in UmIPaddr.
 
  Online: [ srv01 srv02 ]
 
  #65533;Resource Group: UMgroup01
  #65533; #65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
  #65533; #65533; UmIPaddr #65533; (ocf::heartbeat:Dummy2): #65533; 
  #65533; #65533;
#65533;Started srv01
 
  Migration summary:
  * Node srv02:
  * Node srv01:
  #65533; UmIPaddr: migration-threshold=10 fail-count=1 - Fail-count 
  return to the first.
 
  The problem is so that attrd disappears fail-count by reboot.(Hash-tables 
  is Lost.)
  It is a problem very much that the trouble number of times is initialized.
 
  I think that there is the following method.
 
  method 1)Attrd maintain fail-count as a file in /var/run directories and 
  refer.
 
  method 2)When attrd started, Attrd communicates with cib and receives 
  fail-count.
 
  Is there a better method?
 
  Please think about the solution of this problem.
 
 H... a tricky one.
 
 The problem here is that attrd is supposed to be the authoritative
 source for this sort of data.
 Additionally, you don't always want attrd reading from the status
 section - like after the cluster restarts.
 
 For failcount, the crmd could keep a hashtable of the current values
 which it could re-send to attrd if it detects a disconnection.
 But that might not be a generic-enough solution.
 
 The chance that attrd dies _and_ there were relevant values for
 fail-count is pretty remote though... is this a real problem you've
 experienced or a theoretical one?
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
 



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker