Re: [ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

2016-08-03 Thread Jehan-Guillaume de Rorthais
Le Tue, 2 Aug 2016 10:02:36 -0500,
Ken Gaillot  a écrit :

> On 08/02/2016 03:10 AM, Jehan-Guillaume de Rorthais wrote:
> > Le Mon, 1 Aug 2016 12:00:24 -0500,
> > Ken Gaillot  a écrit :
> > 
> >> On 08/01/2016 11:18 AM, Jehan-Guillaume de Rorthais wrote:
> >>> Le Mon, 1 Aug 2016 10:27:53 -0500,
> >>> Ken Gaillot  a écrit :
> >>>
>  On 07/29/2016 06:19 PM, Andrew Beekhof wrote:
> > Urgh. I must be confused with sles11. 
> > In any case, the first version of pacemaker was identical to the last
> > heartbeat crm. 
> >
> > I don't recall the ocfs2 agent changing design while I was there, so 11
> > may be broken too
> 
>  I just realized *_active_* is only broken for master/slave clones.
>  Filesystem is not master/slave, so it wouldn't have any issue.
> >>>
> >>> Well, I'm glad we are the first RA using it :)
> >>>
> >>> I wonders how other m/s RA are doing without it. We are using it (actually
> >>> "master + slave + start - stop" because of the bug) to check during a
> >>> promotion after a failover if the resource being promoted is the best one
> >>> among the known ones.
> >>
> >> Yes, the very simple workaround is simply to set active = master +
> >> slave; that's all the pacemaker fix will do. You'll still need the "+
> >> start - stop" to get the situation after the action.
> > 
> > Ok, thank you for the confirmation.
> > 
> >> Coincidentally, we need to bump crm_feature_set to 3.0.11 anyway, so
> >> you'll be able to test that to tell whether *_active_* is correct, if
> >> desired.
> > 
> > I will test it.
> 
> fix is merged in master branch

I did some superficial tests and it seems to work fine.

I'll do some more test later when I'll work on the OCF_Functions perl module.

Regards,
-- 
Jehan-Guillaume de Rorthais
Dalibo

___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

2016-08-02 Thread Jehan-Guillaume de Rorthais
Le Mon, 1 Aug 2016 12:00:24 -0500,
Ken Gaillot  a écrit :

> On 08/01/2016 11:18 AM, Jehan-Guillaume de Rorthais wrote:
> > Le Mon, 1 Aug 2016 10:27:53 -0500,
> > Ken Gaillot  a écrit :
> > 
> >> On 07/29/2016 06:19 PM, Andrew Beekhof wrote:
> >>> Urgh. I must be confused with sles11. 
> >>> In any case, the first version of pacemaker was identical to the last
> >>> heartbeat crm. 
> >>>
> >>> I don't recall the ocfs2 agent changing design while I was there, so 11
> >>> may be broken too
> >>
> >> I just realized *_active_* is only broken for master/slave clones.
> >> Filesystem is not master/slave, so it wouldn't have any issue.
> > 
> > Well, I'm glad we are the first RA using it :)
> > 
> > I wonders how other m/s RA are doing without it. We are using it (actually
> > "master + slave + start - stop" because of the bug) to check during a
> > promotion after a failover if the resource being promoted is the best one
> > among the known ones.
> 
> Yes, the very simple workaround is simply to set active = master +
> slave; that's all the pacemaker fix will do. You'll still need the "+
> start - stop" to get the situation after the action.

Ok, thank you for the confirmation.

> Coincidentally, we need to bump crm_feature_set to 3.0.11 anyway, so
> you'll be able to test that to tell whether *_active_* is correct, if
> desired.

I will test it.

> There is an ocf_version_cmp function in ocf-shellfuncs.

Our RA is written in perl...but we have ported most of ocf-shellfuncs in a perl
module, including this function :)

cf. [ClusterLabs Developers] Perl Modules for resource agents
Thu, 26 Nov 2015 01:13:36 +0100

>  On 30 Jul 2016, at 8:51 AM, Ken Gaillot  wrote:
> 
> > On 07/29/2016 05:41 PM, Andrew Beekhof wrote:
> >
> >
> > Sent from my iPhone
> >
> >> On 30 Jul 2016, at 8:32 AM, Ken Gaillot  wrote:
> >>
> >> I finally had time to investigate this, and it definitely is broken.
> >>
> >> The only existing heartbeat RA to use the *_notify_active_* variables
> >> is Filesystem, and it only does so for OCFS2 on SLES10, which didn't
> >> even ship pacemaker,
> >
> > I'm pretty sure it did
> 
>  All I could find was:
> 
>  "SLES 10 did not yet ship pacemaker, but heartbeat with the builtin crm"
> 
>  http://oss.clusterlabs.org/pipermail/pacemaker/2014-July/022232.html
> 
>  I'm sure people were compiling it, and ClusterLabs probably even
>  provided a repo, but it looks like sles didn't ship it.
> 
>  The issue is that the code that builds the active list checks for role
>  RSC_ROLE_STARTED rather than RSC_ROLE_SLAVE + RSC_ROLE_MASTER, so I
>  don't think it ever would have worked.
> 
> >
> >> so I'm guessing it's been broken from the beginning of
> >> pacemaker.
> >>
> >> The fix looks straightforward, so I should be able to take care of it
> >> soon.
> >>
> >> Filed bug http://bugs.clusterlabs.org/show_bug.cgi?id=5295
> >>
> >>> On 05/08/2016 04:57 AM, Jehan-Guillaume de Rorthais wrote:
> >>> Le Fri, 6 May 2016 15:41:11 -0500,
> >>> Ken Gaillot  a écrit :
> >>>
> > On 05/03/2016 05:30 PM, Jehan-Guillaume de Rorthais wrote:
> > Le Tue, 3 May 2016 21:10:12 +0200,
> > Jehan-Guillaume de Rorthais  a écrit :
> >
> >> Le Mon, 2 May 2016 17:59:55 -0500,
> >> Ken Gaillot  a écrit :
> >>
>  On 04/28/2016 04:47 AM, Jehan-Guillaume de Rorthais wrote:
>  Hello all,
> 
>  While testing and experiencing with our RA for PostgreSQL, I
>  found the meta_notify_active_* variables seems always empty.
>  Here is an example of these variables as they are seen from our
>  RA during a migration/switchover:
> 
> 
>  {
>    'type' => 'pre',
>    'operation' => 'demote',
>    'active' => [],
>    'inactive' => [],
>    'start' => [],
>    'stop' => [],
>    'demote' => [
>  {
>    'rsc' => 'pgsqld:1',
>    'uname' => 'hanode1'
>  }
>    ],
> 
>    'master' => [
>  {
>    'rsc' => 'pgsqld:1',
>    'uname' => 'hanode1'
>  }
>    ],
> 
>    'promote' => [
>   {
> 'rsc' => 'pgsqld:0',
> 'uname' => 'hanode3'
>   }
> 

Re: [ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

2016-08-01 Thread Ken Gaillot
On 08/01/2016 11:18 AM, Jehan-Guillaume de Rorthais wrote:
> Le Mon, 1 Aug 2016 10:27:53 -0500,
> Ken Gaillot  a écrit :
> 
>> On 07/29/2016 06:19 PM, Andrew Beekhof wrote:
>>> Urgh. I must be confused with sles11. 
>>> In any case, the first version of pacemaker was identical to the last
>>> heartbeat crm. 
>>>
>>> I don't recall the ocfs2 agent changing design while I was there, so 11 may
>>> be broken too
>>
>> I just realized *_active_* is only broken for master/slave clones.
>> Filesystem is not master/slave, so it wouldn't have any issue.
> 
> Well, I'm glad we are the first RA using it :)
> 
> I wonders how other m/s RA are doing without it. We are using it (actually
> "master + slave + start - stop" because of the bug) to check during a
> promotion after a failover if the resource being promoted is the best one
> among the known ones.

Yes, the very simple workaround is simply to set active = master +
slave; that's all the pacemaker fix will do. You'll still need the "+
start - stop" to get the situation after the action.

Coincidentally, we need to bump crm_feature_set to 3.0.11 anyway, so
you'll be able to test that to tell whether *_active_* is correct, if
desired. There is an ocf_version_cmp function in ocf-shellfuncs.

> 
 On 30 Jul 2016, at 8:51 AM, Ken Gaillot  wrote:

> On 07/29/2016 05:41 PM, Andrew Beekhof wrote:
>
>
> Sent from my iPhone
>
>> On 30 Jul 2016, at 8:32 AM, Ken Gaillot  wrote:
>>
>> I finally had time to investigate this, and it definitely is broken.
>>
>> The only existing heartbeat RA to use the *_notify_active_* variables is
>> Filesystem, and it only does so for OCFS2 on SLES10, which didn't even
>> ship pacemaker,
>
> I'm pretty sure it did

 All I could find was:

 "SLES 10 did not yet ship pacemaker, but heartbeat with the builtin crm"

 http://oss.clusterlabs.org/pipermail/pacemaker/2014-July/022232.html

 I'm sure people were compiling it, and ClusterLabs probably even
 provided a repo, but it looks like sles didn't ship it.

 The issue is that the code that builds the active list checks for role
 RSC_ROLE_STARTED rather than RSC_ROLE_SLAVE + RSC_ROLE_MASTER, so I
 don't think it ever would have worked.

>
>> so I'm guessing it's been broken from the beginning of
>> pacemaker.
>>
>> The fix looks straightforward, so I should be able to take care of it
>> soon.
>>
>> Filed bug http://bugs.clusterlabs.org/show_bug.cgi?id=5295
>>
>>> On 05/08/2016 04:57 AM, Jehan-Guillaume de Rorthais wrote:
>>> Le Fri, 6 May 2016 15:41:11 -0500,
>>> Ken Gaillot  a écrit :
>>>
> On 05/03/2016 05:30 PM, Jehan-Guillaume de Rorthais wrote:
> Le Tue, 3 May 2016 21:10:12 +0200,
> Jehan-Guillaume de Rorthais  a écrit :
>
>> Le Mon, 2 May 2016 17:59:55 -0500,
>> Ken Gaillot  a écrit :
>>
 On 04/28/2016 04:47 AM, Jehan-Guillaume de Rorthais wrote:
 Hello all,

 While testing and experiencing with our RA for PostgreSQL, I found
 the meta_notify_active_* variables seems always empty. Here is an
 example of these variables as they are seen from our RA during a
 migration/switchover:


 {
   'type' => 'pre',
   'operation' => 'demote',
   'active' => [],
   'inactive' => [],
   'start' => [],
   'stop' => [],
   'demote' => [
 {
   'rsc' => 'pgsqld:1',
   'uname' => 'hanode1'
 }
   ],

   'master' => [
 {
   'rsc' => 'pgsqld:1',
   'uname' => 'hanode1'
 }
   ],

   'promote' => [
  {
'rsc' => 'pgsqld:0',
'uname' => 'hanode3'
  }
],
   'slave' => [
{
  'rsc' => 'pgsqld:0',
  'uname' => 'hanode3'
},
{
  'rsc' => 'pgsqld:2',
  'uname' => 'hanode2'
}
  ],

 }

 In case this comes from our side, here is code building this:

 

Re: [ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

2016-08-01 Thread Jehan-Guillaume de Rorthais
Le Mon, 1 Aug 2016 10:27:53 -0500,
Ken Gaillot  a écrit :

> On 07/29/2016 06:19 PM, Andrew Beekhof wrote:
> > Urgh. I must be confused with sles11. 
> > In any case, the first version of pacemaker was identical to the last
> > heartbeat crm. 
> > 
> > I don't recall the ocfs2 agent changing design while I was there, so 11 may
> > be broken too
> 
> I just realized *_active_* is only broken for master/slave clones.
> Filesystem is not master/slave, so it wouldn't have any issue.

Well, I'm glad we are the first RA using it :)

I wonders how other m/s RA are doing without it. We are using it (actually
"master + slave + start - stop" because of the bug) to check during a
promotion after a failover if the resource being promoted is the best one
among the known ones.

> >> On 30 Jul 2016, at 8:51 AM, Ken Gaillot  wrote:
> >>
> >>> On 07/29/2016 05:41 PM, Andrew Beekhof wrote:
> >>>
> >>>
> >>> Sent from my iPhone
> >>>
>  On 30 Jul 2016, at 8:32 AM, Ken Gaillot  wrote:
> 
>  I finally had time to investigate this, and it definitely is broken.
> 
>  The only existing heartbeat RA to use the *_notify_active_* variables is
>  Filesystem, and it only does so for OCFS2 on SLES10, which didn't even
>  ship pacemaker,
> >>>
> >>> I'm pretty sure it did
> >>
> >> All I could find was:
> >>
> >> "SLES 10 did not yet ship pacemaker, but heartbeat with the builtin crm"
> >>
> >> http://oss.clusterlabs.org/pipermail/pacemaker/2014-July/022232.html
> >>
> >> I'm sure people were compiling it, and ClusterLabs probably even
> >> provided a repo, but it looks like sles didn't ship it.
> >>
> >> The issue is that the code that builds the active list checks for role
> >> RSC_ROLE_STARTED rather than RSC_ROLE_SLAVE + RSC_ROLE_MASTER, so I
> >> don't think it ever would have worked.
> >>
> >>>
>  so I'm guessing it's been broken from the beginning of
>  pacemaker.
> 
>  The fix looks straightforward, so I should be able to take care of it
>  soon.
> 
>  Filed bug http://bugs.clusterlabs.org/show_bug.cgi?id=5295
> 
> > On 05/08/2016 04:57 AM, Jehan-Guillaume de Rorthais wrote:
> > Le Fri, 6 May 2016 15:41:11 -0500,
> > Ken Gaillot  a écrit :
> >
> >>> On 05/03/2016 05:30 PM, Jehan-Guillaume de Rorthais wrote:
> >>> Le Tue, 3 May 2016 21:10:12 +0200,
> >>> Jehan-Guillaume de Rorthais  a écrit :
> >>>
>  Le Mon, 2 May 2016 17:59:55 -0500,
>  Ken Gaillot  a écrit :
> 
> >> On 04/28/2016 04:47 AM, Jehan-Guillaume de Rorthais wrote:
> >> Hello all,
> >>
> >> While testing and experiencing with our RA for PostgreSQL, I found
> >> the meta_notify_active_* variables seems always empty. Here is an
> >> example of these variables as they are seen from our RA during a
> >> migration/switchover:
> >>
> >>
> >> {
> >>   'type' => 'pre',
> >>   'operation' => 'demote',
> >>   'active' => [],
> >>   'inactive' => [],
> >>   'start' => [],
> >>   'stop' => [],
> >>   'demote' => [
> >> {
> >>   'rsc' => 'pgsqld:1',
> >>   'uname' => 'hanode1'
> >> }
> >>   ],
> >>
> >>   'master' => [
> >> {
> >>   'rsc' => 'pgsqld:1',
> >>   'uname' => 'hanode1'
> >> }
> >>   ],
> >>
> >>   'promote' => [
> >>  {
> >>'rsc' => 'pgsqld:0',
> >>'uname' => 'hanode3'
> >>  }
> >>],
> >>   'slave' => [
> >>{
> >>  'rsc' => 'pgsqld:0',
> >>  'uname' => 'hanode3'
> >>},
> >>{
> >>  'rsc' => 'pgsqld:2',
> >>  'uname' => 'hanode2'
> >>}
> >>  ],
> >>
> >> }
> >>
> >> In case this comes from our side, here is code building this:
> >>
> >> https://github.com/dalibo/PAF/blob/6e86284bc647ef1e81f01f047f1862e40ba62906/lib/OCF_Functions.pm#L444
> >>
> >> But looking at the variable itself in debug logs, I always find it
> >> empty, in various situations (switchover, recover, failover).
> >>
> >> If I understand the documentation correctly, I would expect
> >> 'active' to list all the three resources, shouldn't it? Currently,
> >> to bypass this, we consider: active == master + slave
> >
> > 

Re: [ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

2016-08-01 Thread Ken Gaillot
On 07/29/2016 06:19 PM, Andrew Beekhof wrote:
> Urgh. I must be confused with sles11. 
> In any case, the first version of pacemaker was identical to the last 
> heartbeat crm. 
> 
> I don't recall the ocfs2 agent changing design while I was there, so 11 may 
> be broken too

I just realized *_active_* is only broken for master/slave clones.
Filesystem is not master/slave, so it wouldn't have any issue.

> Sent from my iPhone
> 
>> On 30 Jul 2016, at 8:51 AM, Ken Gaillot  wrote:
>>
>>> On 07/29/2016 05:41 PM, Andrew Beekhof wrote:
>>>
>>>
>>> Sent from my iPhone
>>>
 On 30 Jul 2016, at 8:32 AM, Ken Gaillot  wrote:

 I finally had time to investigate this, and it definitely is broken.

 The only existing heartbeat RA to use the *_notify_active_* variables is
 Filesystem, and it only does so for OCFS2 on SLES10, which didn't even
 ship pacemaker,
>>>
>>> I'm pretty sure it did
>>
>> All I could find was:
>>
>> "SLES 10 did not yet ship pacemaker, but heartbeat with the builtin crm"
>>
>> http://oss.clusterlabs.org/pipermail/pacemaker/2014-July/022232.html
>>
>> I'm sure people were compiling it, and ClusterLabs probably even
>> provided a repo, but it looks like sles didn't ship it.
>>
>> The issue is that the code that builds the active list checks for role
>> RSC_ROLE_STARTED rather than RSC_ROLE_SLAVE + RSC_ROLE_MASTER, so I
>> don't think it ever would have worked.
>>
>>>
 so I'm guessing it's been broken from the beginning of
 pacemaker.

 The fix looks straightforward, so I should be able to take care of it soon.

 Filed bug http://bugs.clusterlabs.org/show_bug.cgi?id=5295

> On 05/08/2016 04:57 AM, Jehan-Guillaume de Rorthais wrote:
> Le Fri, 6 May 2016 15:41:11 -0500,
> Ken Gaillot  a écrit :
>
>>> On 05/03/2016 05:30 PM, Jehan-Guillaume de Rorthais wrote:
>>> Le Tue, 3 May 2016 21:10:12 +0200,
>>> Jehan-Guillaume de Rorthais  a écrit :
>>>
 Le Mon, 2 May 2016 17:59:55 -0500,
 Ken Gaillot  a écrit :

>> On 04/28/2016 04:47 AM, Jehan-Guillaume de Rorthais wrote:
>> Hello all,
>>
>> While testing and experiencing with our RA for PostgreSQL, I found 
>> the
>> meta_notify_active_* variables seems always empty. Here is an 
>> example of
>> these variables as they are seen from our RA during a
>> migration/switchover:
>>
>>
>> {
>>   'type' => 'pre',
>>   'operation' => 'demote',
>>   'active' => [],
>>   'inactive' => [],
>>   'start' => [],
>>   'stop' => [],
>>   'demote' => [
>> {
>>   'rsc' => 'pgsqld:1',
>>   'uname' => 'hanode1'
>> }
>>   ],
>>
>>   'master' => [
>> {
>>   'rsc' => 'pgsqld:1',
>>   'uname' => 'hanode1'
>> }
>>   ],
>>
>>   'promote' => [
>>  {
>>'rsc' => 'pgsqld:0',
>>'uname' => 'hanode3'
>>  }
>>],
>>   'slave' => [
>>{
>>  'rsc' => 'pgsqld:0',
>>  'uname' => 'hanode3'
>>},
>>{
>>  'rsc' => 'pgsqld:2',
>>  'uname' => 'hanode2'
>>}
>>  ],
>>
>> }
>>
>> In case this comes from our side, here is code building this:
>>
>> https://github.com/dalibo/PAF/blob/6e86284bc647ef1e81f01f047f1862e40ba62906/lib/OCF_Functions.pm#L444
>>
>> But looking at the variable itself in debug logs, I always find it 
>> empty,
>> in various situations (switchover, recover, failover).
>>
>> If I understand the documentation correctly, I would expect 'active' 
>> to
>> list all the three resources, shouldn't it? Currently, to bypass 
>> this, we
>> consider: active == master + slave
>
> You're right, it should. The pacemaker code that generates the 
> "active"
> variables is the same used for "demote" etc., so it seems unlikely the
> issue is on pacemaker's side. Especially since your code treats active
> etc. differently from demote etc., it seems like it must be in there
> somewhere, but I don't see where.

 The code treat active, inactive, start and stop all together, for any
 cloned resource. If the resource is a multistate, it adds promote, 

Re: [ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

2016-07-31 Thread Jehan-Guillaume de Rorthais
Le Fri, 29 Jul 2016 17:32:14 -0500,
Ken Gaillot  a écrit :

> I finally had time to investigate this, and it definitely is broken.

Thank for your investigation and the update on this Ken.

Regards,
-- 
Jehan-Guillaume de Rorthais
Dalibo

___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

2016-07-30 Thread Andrew Beekhof


Sent from my iPhone

> On 30 Jul 2016, at 8:32 AM, Ken Gaillot  wrote:
> 
> I finally had time to investigate this, and it definitely is broken.
> 
> The only existing heartbeat RA to use the *_notify_active_* variables is
> Filesystem, and it only does so for OCFS2 on SLES10, which didn't even
> ship pacemaker,

I'm pretty sure it did

> so I'm guessing it's been broken from the beginning of
> pacemaker.
> 
> The fix looks straightforward, so I should be able to take care of it soon.
> 
> Filed bug http://bugs.clusterlabs.org/show_bug.cgi?id=5295
> 
>> On 05/08/2016 04:57 AM, Jehan-Guillaume de Rorthais wrote:
>> Le Fri, 6 May 2016 15:41:11 -0500,
>> Ken Gaillot  a écrit :
>> 
 On 05/03/2016 05:30 PM, Jehan-Guillaume de Rorthais wrote:
 Le Tue, 3 May 2016 21:10:12 +0200,
 Jehan-Guillaume de Rorthais  a écrit :
 
> Le Mon, 2 May 2016 17:59:55 -0500,
> Ken Gaillot  a écrit :
> 
>>> On 04/28/2016 04:47 AM, Jehan-Guillaume de Rorthais wrote:
>>> Hello all,
>>> 
>>> While testing and experiencing with our RA for PostgreSQL, I found the
>>> meta_notify_active_* variables seems always empty. Here is an example of
>>> these variables as they are seen from our RA during a
>>> migration/switchover:
>>> 
>>> 
>>>  {
>>>'type' => 'pre',
>>>'operation' => 'demote',
>>>'active' => [],
>>>'inactive' => [],
>>>'start' => [],
>>>'stop' => [],
>>>'demote' => [
>>>  {
>>>'rsc' => 'pgsqld:1',
>>>'uname' => 'hanode1'
>>>  }
>>>],
>>> 
>>>'master' => [
>>>  {
>>>'rsc' => 'pgsqld:1',
>>>'uname' => 'hanode1'
>>>  }
>>>],
>>> 
>>>'promote' => [
>>>   {
>>> 'rsc' => 'pgsqld:0',
>>> 'uname' => 'hanode3'
>>>   }
>>> ],
>>>'slave' => [
>>> {
>>>   'rsc' => 'pgsqld:0',
>>>   'uname' => 'hanode3'
>>> },
>>> {
>>>   'rsc' => 'pgsqld:2',
>>>   'uname' => 'hanode2'
>>> }
>>>   ],
>>> 
>>>  }
>>> 
>>> In case this comes from our side, here is code building this:
>>> 
>>>  
>>> https://github.com/dalibo/PAF/blob/6e86284bc647ef1e81f01f047f1862e40ba62906/lib/OCF_Functions.pm#L444
>>> 
>>> But looking at the variable itself in debug logs, I always find it 
>>> empty,
>>> in various situations (switchover, recover, failover).
>>> 
>>> If I understand the documentation correctly, I would expect 'active' to
>>> list all the three resources, shouldn't it? Currently, to bypass this, 
>>> we
>>> consider: active == master + slave
>> 
>> You're right, it should. The pacemaker code that generates the "active"
>> variables is the same used for "demote" etc., so it seems unlikely the
>> issue is on pacemaker's side. Especially since your code treats active
>> etc. differently from demote etc., it seems like it must be in there
>> somewhere, but I don't see where.
> 
> The code treat active, inactive, start and stop all together, for any
> cloned resource. If the resource is a multistate, it adds promote, demote,
> slave and master.
> 
> Note that from this piece of code, the 7 other notify vars are set
> correctly: start, stop, inactive, promote, demote, slave, master. Only
> active is always missing.
> 
> I'll investigate and try to find where is hiding the bug.
 
 So I added a piece of code to dump the **all** the environment variables to
 a temp file as early as possible **to avoid any interaction with our perl
 module** in the code of the RA, ie.:
 
  BEGIN {
use Time::HiRes qw(time);
my $now = time;
open my $fh, ">", "/tmp/test-$now.env.txt";
printf($fh "%-20s = ''%s''\n", $_, $ENV{$_}) foreach sort keys %ENV;
  }
 
 Then I started my cluster and set maintenance-mode=false while no resources
 where running. So the debug files contains the probe action, start on all
 nodes, one promote on the master and the first monitors. The "*active"
 variables are always empty anywhere in the cluster. Find in attachment the
 result of the following command on the master node:
 
  for i in test-*; do echo "= $i ="; grep OCF_ $i; done >
 debug-env.txt
 
 I'm using Pacemaker 1.1.13-10.el7_2.2-44eb2dd under CentOS 7.2.1511.
 
 For completeness, I added the Pacemaker configuration I use for my 3 

Re: [ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

2016-07-30 Thread Andrew Beekhof
Urgh. I must be confused with sles11. 
In any case, the first version of pacemaker was identical to the last heartbeat 
crm. 

I don't recall the ocfs2 agent changing design while I was there, so 11 may be 
broken too

Sent from my iPhone

> On 30 Jul 2016, at 8:51 AM, Ken Gaillot  wrote:
> 
>> On 07/29/2016 05:41 PM, Andrew Beekhof wrote:
>> 
>> 
>> Sent from my iPhone
>> 
>>> On 30 Jul 2016, at 8:32 AM, Ken Gaillot  wrote:
>>> 
>>> I finally had time to investigate this, and it definitely is broken.
>>> 
>>> The only existing heartbeat RA to use the *_notify_active_* variables is
>>> Filesystem, and it only does so for OCFS2 on SLES10, which didn't even
>>> ship pacemaker,
>> 
>> I'm pretty sure it did
> 
> All I could find was:
> 
> "SLES 10 did not yet ship pacemaker, but heartbeat with the builtin crm"
> 
> http://oss.clusterlabs.org/pipermail/pacemaker/2014-July/022232.html
> 
> I'm sure people were compiling it, and ClusterLabs probably even
> provided a repo, but it looks like sles didn't ship it.
> 
> The issue is that the code that builds the active list checks for role
> RSC_ROLE_STARTED rather than RSC_ROLE_SLAVE + RSC_ROLE_MASTER, so I
> don't think it ever would have worked.
> 
>> 
>>> so I'm guessing it's been broken from the beginning of
>>> pacemaker.
>>> 
>>> The fix looks straightforward, so I should be able to take care of it soon.
>>> 
>>> Filed bug http://bugs.clusterlabs.org/show_bug.cgi?id=5295
>>> 
 On 05/08/2016 04:57 AM, Jehan-Guillaume de Rorthais wrote:
 Le Fri, 6 May 2016 15:41:11 -0500,
 Ken Gaillot  a écrit :
 
>> On 05/03/2016 05:30 PM, Jehan-Guillaume de Rorthais wrote:
>> Le Tue, 3 May 2016 21:10:12 +0200,
>> Jehan-Guillaume de Rorthais  a écrit :
>> 
>>> Le Mon, 2 May 2016 17:59:55 -0500,
>>> Ken Gaillot  a écrit :
>>> 
> On 04/28/2016 04:47 AM, Jehan-Guillaume de Rorthais wrote:
> Hello all,
> 
> While testing and experiencing with our RA for PostgreSQL, I found the
> meta_notify_active_* variables seems always empty. Here is an example 
> of
> these variables as they are seen from our RA during a
> migration/switchover:
> 
> 
> {
>   'type' => 'pre',
>   'operation' => 'demote',
>   'active' => [],
>   'inactive' => [],
>   'start' => [],
>   'stop' => [],
>   'demote' => [
> {
>   'rsc' => 'pgsqld:1',
>   'uname' => 'hanode1'
> }
>   ],
> 
>   'master' => [
> {
>   'rsc' => 'pgsqld:1',
>   'uname' => 'hanode1'
> }
>   ],
> 
>   'promote' => [
>  {
>'rsc' => 'pgsqld:0',
>'uname' => 'hanode3'
>  }
>],
>   'slave' => [
>{
>  'rsc' => 'pgsqld:0',
>  'uname' => 'hanode3'
>},
>{
>  'rsc' => 'pgsqld:2',
>  'uname' => 'hanode2'
>}
>  ],
> 
> }
> 
> In case this comes from our side, here is code building this:
> 
> https://github.com/dalibo/PAF/blob/6e86284bc647ef1e81f01f047f1862e40ba62906/lib/OCF_Functions.pm#L444
> 
> But looking at the variable itself in debug logs, I always find it 
> empty,
> in various situations (switchover, recover, failover).
> 
> If I understand the documentation correctly, I would expect 'active' 
> to
> list all the three resources, shouldn't it? Currently, to bypass 
> this, we
> consider: active == master + slave
 
 You're right, it should. The pacemaker code that generates the "active"
 variables is the same used for "demote" etc., so it seems unlikely the
 issue is on pacemaker's side. Especially since your code treats active
 etc. differently from demote etc., it seems like it must be in there
 somewhere, but I don't see where.
>>> 
>>> The code treat active, inactive, start and stop all together, for any
>>> cloned resource. If the resource is a multistate, it adds promote, 
>>> demote,
>>> slave and master.
>>> 
>>> Note that from this piece of code, the 7 other notify vars are set
>>> correctly: start, stop, inactive, promote, demote, slave, master. Only
>>> active is always missing.
>>> 
>>> I'll investigate and try to find where is hiding 

Re: [ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

2016-07-29 Thread Ken Gaillot
On 07/29/2016 05:41 PM, Andrew Beekhof wrote:
> 
> 
> Sent from my iPhone
> 
>> On 30 Jul 2016, at 8:32 AM, Ken Gaillot  wrote:
>>
>> I finally had time to investigate this, and it definitely is broken.
>>
>> The only existing heartbeat RA to use the *_notify_active_* variables is
>> Filesystem, and it only does so for OCFS2 on SLES10, which didn't even
>> ship pacemaker,
> 
> I'm pretty sure it did

All I could find was:

"SLES 10 did not yet ship pacemaker, but heartbeat with the builtin crm"

http://oss.clusterlabs.org/pipermail/pacemaker/2014-July/022232.html

I'm sure people were compiling it, and ClusterLabs probably even
provided a repo, but it looks like sles didn't ship it.

The issue is that the code that builds the active list checks for role
RSC_ROLE_STARTED rather than RSC_ROLE_SLAVE + RSC_ROLE_MASTER, so I
don't think it ever would have worked.

> 
>> so I'm guessing it's been broken from the beginning of
>> pacemaker.
>>
>> The fix looks straightforward, so I should be able to take care of it soon.
>>
>> Filed bug http://bugs.clusterlabs.org/show_bug.cgi?id=5295
>>
>>> On 05/08/2016 04:57 AM, Jehan-Guillaume de Rorthais wrote:
>>> Le Fri, 6 May 2016 15:41:11 -0500,
>>> Ken Gaillot  a écrit :
>>>
> On 05/03/2016 05:30 PM, Jehan-Guillaume de Rorthais wrote:
> Le Tue, 3 May 2016 21:10:12 +0200,
> Jehan-Guillaume de Rorthais  a écrit :
>
>> Le Mon, 2 May 2016 17:59:55 -0500,
>> Ken Gaillot  a écrit :
>>
 On 04/28/2016 04:47 AM, Jehan-Guillaume de Rorthais wrote:
 Hello all,

 While testing and experiencing with our RA for PostgreSQL, I found the
 meta_notify_active_* variables seems always empty. Here is an example 
 of
 these variables as they are seen from our RA during a
 migration/switchover:


  {
'type' => 'pre',
'operation' => 'demote',
'active' => [],
'inactive' => [],
'start' => [],
'stop' => [],
'demote' => [
  {
'rsc' => 'pgsqld:1',
'uname' => 'hanode1'
  }
],

'master' => [
  {
'rsc' => 'pgsqld:1',
'uname' => 'hanode1'
  }
],

'promote' => [
   {
 'rsc' => 'pgsqld:0',
 'uname' => 'hanode3'
   }
 ],
'slave' => [
 {
   'rsc' => 'pgsqld:0',
   'uname' => 'hanode3'
 },
 {
   'rsc' => 'pgsqld:2',
   'uname' => 'hanode2'
 }
   ],

  }

 In case this comes from our side, here is code building this:

  
 https://github.com/dalibo/PAF/blob/6e86284bc647ef1e81f01f047f1862e40ba62906/lib/OCF_Functions.pm#L444

 But looking at the variable itself in debug logs, I always find it 
 empty,
 in various situations (switchover, recover, failover).

 If I understand the documentation correctly, I would expect 'active' to
 list all the three resources, shouldn't it? Currently, to bypass this, 
 we
 consider: active == master + slave
>>>
>>> You're right, it should. The pacemaker code that generates the "active"
>>> variables is the same used for "demote" etc., so it seems unlikely the
>>> issue is on pacemaker's side. Especially since your code treats active
>>> etc. differently from demote etc., it seems like it must be in there
>>> somewhere, but I don't see where.
>>
>> The code treat active, inactive, start and stop all together, for any
>> cloned resource. If the resource is a multistate, it adds promote, 
>> demote,
>> slave and master.
>>
>> Note that from this piece of code, the 7 other notify vars are set
>> correctly: start, stop, inactive, promote, demote, slave, master. Only
>> active is always missing.
>>
>> I'll investigate and try to find where is hiding the bug.
>
> So I added a piece of code to dump the **all** the environment variables 
> to
> a temp file as early as possible **to avoid any interaction with our perl
> module** in the code of the RA, ie.:
>
>  BEGIN {
>use Time::HiRes qw(time);
>my $now = time;
>open my $fh, ">", "/tmp/test-$now.env.txt";
>printf($fh "%-20s = ''%s''\n", $_, $ENV{$_}) foreach sort keys %ENV;

Re: [ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

2016-07-29 Thread Ken Gaillot
I finally had time to investigate this, and it definitely is broken.

The only existing heartbeat RA to use the *_notify_active_* variables is
Filesystem, and it only does so for OCFS2 on SLES10, which didn't even
ship pacemaker, so I'm guessing it's been broken from the beginning of
pacemaker.

The fix looks straightforward, so I should be able to take care of it soon.

Filed bug http://bugs.clusterlabs.org/show_bug.cgi?id=5295

On 05/08/2016 04:57 AM, Jehan-Guillaume de Rorthais wrote:
> Le Fri, 6 May 2016 15:41:11 -0500,
> Ken Gaillot  a écrit :
> 
>> On 05/03/2016 05:30 PM, Jehan-Guillaume de Rorthais wrote:
>>> Le Tue, 3 May 2016 21:10:12 +0200,
>>> Jehan-Guillaume de Rorthais  a écrit :
>>>
 Le Mon, 2 May 2016 17:59:55 -0500,
 Ken Gaillot  a écrit :

> On 04/28/2016 04:47 AM, Jehan-Guillaume de Rorthais wrote:
>> Hello all,
>>
>> While testing and experiencing with our RA for PostgreSQL, I found the
>> meta_notify_active_* variables seems always empty. Here is an example of
>> these variables as they are seen from our RA during a
>> migration/switchover:
>>
>>
>>   {
>> 'type' => 'pre',
>> 'operation' => 'demote',
>> 'active' => [],
>> 'inactive' => [],
>> 'start' => [],
>> 'stop' => [],
>> 'demote' => [
>>   {
>> 'rsc' => 'pgsqld:1',
>> 'uname' => 'hanode1'
>>   }
>> ],
>> 
>> 'master' => [
>>   {
>> 'rsc' => 'pgsqld:1',
>> 'uname' => 'hanode1'
>>   }
>> ],
>> 
>> 'promote' => [
>>{
>>  'rsc' => 'pgsqld:0',
>>  'uname' => 'hanode3'
>>}
>>  ],
>> 'slave' => [
>>  {
>>'rsc' => 'pgsqld:0',
>>'uname' => 'hanode3'
>>  },
>>  {
>>'rsc' => 'pgsqld:2',
>>'uname' => 'hanode2'
>>  }
>>],
>> 
>>   }
>>
>> In case this comes from our side, here is code building this:
>>
>>   
>> https://github.com/dalibo/PAF/blob/6e86284bc647ef1e81f01f047f1862e40ba62906/lib/OCF_Functions.pm#L444
>>
>> But looking at the variable itself in debug logs, I always find it empty,
>> in various situations (switchover, recover, failover).
>>
>> If I understand the documentation correctly, I would expect 'active' to
>> list all the three resources, shouldn't it? Currently, to bypass this, we
>> consider: active == master + slave
>
> You're right, it should. The pacemaker code that generates the "active"
> variables is the same used for "demote" etc., so it seems unlikely the
> issue is on pacemaker's side. Especially since your code treats active
> etc. differently from demote etc., it seems like it must be in there
> somewhere, but I don't see where.

 The code treat active, inactive, start and stop all together, for any
 cloned resource. If the resource is a multistate, it adds promote, demote,
 slave and master.

 Note that from this piece of code, the 7 other notify vars are set
 correctly: start, stop, inactive, promote, demote, slave, master. Only
 active is always missing.

 I'll investigate and try to find where is hiding the bug.
>>>
>>> So I added a piece of code to dump the **all** the environment variables to
>>> a temp file as early as possible **to avoid any interaction with our perl
>>> module** in the code of the RA, ie.:
>>>
>>>   BEGIN {
>>> use Time::HiRes qw(time);
>>> my $now = time;
>>> open my $fh, ">", "/tmp/test-$now.env.txt";
>>> printf($fh "%-20s = ''%s''\n", $_, $ENV{$_}) foreach sort keys %ENV;
>>>   }
>>>
>>> Then I started my cluster and set maintenance-mode=false while no resources
>>> where running. So the debug files contains the probe action, start on all
>>> nodes, one promote on the master and the first monitors. The "*active"
>>> variables are always empty anywhere in the cluster. Find in attachment the
>>> result of the following command on the master node:
>>>
>>>   for i in test-*; do echo "= $i ="; grep OCF_ $i; done >
>>> debug-env.txt
>>>
>>> I'm using Pacemaker 1.1.13-10.el7_2.2-44eb2dd under CentOS 7.2.1511.
>>>
>>> For completeness, I added the Pacemaker configuration I use for my 3 node
>>> dev/test cluster.
>>>
>>> Let me know if you think of more investigations and test I could run on this
>>> issue. I'm out of ideas for tonight (and I really would prefer having this
>>> bug on my side).
>>
>> From your environment 

Re: [ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

2016-05-02 Thread Ken Gaillot
On 04/28/2016 04:47 AM, Jehan-Guillaume de Rorthais wrote:
> Hello all,
> 
> While testing and experiencing with our RA for PostgreSQL, I found the
> meta_notify_active_* variables seems always empty. Here is an example of
> these variables as they are seen from our RA during a migration/switchover:
> 
> 
>   {
> 'type' => 'pre',
> 'operation' => 'demote',
> 'active' => [],
> 'inactive' => [],
> 'start' => [],
> 'stop' => [],
> 'demote' => [
>   {
> 'rsc' => 'pgsqld:1',
> 'uname' => 'hanode1'
>   }
> ],
> 
> 'master' => [
>   {
> 'rsc' => 'pgsqld:1',
> 'uname' => 'hanode1'
>   }
> ],
> 
> 'promote' => [
>{
>  'rsc' => 'pgsqld:0',
>  'uname' => 'hanode3'
>}
>  ],
> 'slave' => [
>  {
>'rsc' => 'pgsqld:0',
>'uname' => 'hanode3'
>  },
>  {
>'rsc' => 'pgsqld:2',
>'uname' => 'hanode2'
>  }
>],
> 
>   }
> 
> In case this comes from our side, here is code building this:
> 
>   
> https://github.com/dalibo/PAF/blob/6e86284bc647ef1e81f01f047f1862e40ba62906/lib/OCF_Functions.pm#L444
> 
> But looking at the variable itself in debug logs, I always find it empty, in
> various situations (switchover, recover, failover).
> 
> If I understand the documentation correctly, I would expect 'active' to list
> all the three resources, shouldn't it? Currently, to bypass this, we
> consider: active == master + slave

You're right, it should. The pacemaker code that generates the "active"
variables is the same used for "demote" etc., so it seems unlikely the
issue is on pacemaker's side. Especially since your code treats active
etc. differently from demote etc., it seems like it must be in there
somewhere, but I don't see where.

Which debug logs are you referring to?

> Comments? Help?
> 
> Regards,
> 


___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers