Re: [ClusterLabs Developers] libqb: Re: 633f262 logging: Remove linker 'magic' and just use statics for logging callsites (#322)

2019-01-16 Thread Andrew Beekhof



> On 17 Jan 2019, at 2:59 am, Ken Gaillot  wrote:
> 
> I'm not familiar with the reasoning for the current setup, but
> pacemaker's crm_crit(), crm_error(), etc. use qb_logt(), while
> crm_debug() and crm_trace() (which won't be used in ordinary runs) do
> something similar to what you propose.
> 
> Pacemaker has about 1,700 logging calls that would be affected (not
> counting another 2,000 debug/trace). Presumably that means Pacemaker
> currently has about +16KB of memory overhead and binary size for
> debug/trace logging static pointers, and that would almost double using
> them for all logs. Not a big deal today? Or meaningful in an embedded
> context?
> 
> Not sure if that overhead vs runtime trade-off is the original
> motivation or not, but that's the first thing that comes to mind.

I believe my interest was the ability to turn them on dynamically in a running 
program (yes, i used it plenty back in the day) and have the overhead be 
minimal for the normal case when they weren't in use.

> 
> On Wed, 2019-01-16 at 16:20 +0100, Lars Ellenberg wrote:
>> On Wed, Jan 16, 2019 at 03:44:22PM +0100, Lars Ellenberg wrote:
>>> Back then when this "dynamic" logging was introduced,
>>> I thought the whole point was that "quiet" callsites
>>> are "cheap".
>>> 
>>> So I think you want to
>>> qb_log_callsite_get() only *once* per callsite,
>>> assign that to a static pointer (as you say in the commit message).
>>> And do the actual qb_log_real_() function call
>>> only conditionally based on if (cs->targets). 
>>> 
>>> That way, a disabled trace logging boils down to a
>>> if (cs && cs->targets)
>>>;
>>> Much cheaper than what you have now.
>>> 
>>> Or was always calling into both qb_log_callsite_get() and
>>> qb_log_real_()
>>> intentional for some obscure (to me) reason,
>>> even for disabled call sites?
>>> 
>>> Below I also added a test for (cs->tags & QB_LOG_TAG_LIBQB_MSG),
>>> in case someone used qb_util_set_log_function().
>>> If that special tag flag could be folded into cs->targets (e.g. bit
>>> 0),
>>> I'd like it even better.
>>> 
>>> Cheers,
>>>Lars
>>> 
>>> 
>>> diff --git a/include/qb/qblog.h b/include/qb/qblog.h
>>> index 1943b94..f63f4ad 100644
>> 
>> Oops, that patch was against *before* the 633f262 commit :-)
>> and that's why I did not notice the clash in macro argument "tags"
>> and struct member "tags" when compile testing...
>> I forgot I jumped between checkouts :-(
>> 
>> anyways, the (now even make check testet)
>> patch for *after* 633f262 commit:
>> 
>> diff --git a/include/qb/qblog.h b/include/qb/qblog.h
>> index 31981b8..ae1d25c 100644
>> --- a/include/qb/qblog.h
>> +++ b/include/qb/qblog.h
>> @@ -340,11 +340,17 @@ void qb_log_from_external_source_va(const char
>> *function,
>>  * @param fmt usual printf style format specifiers
>>  * @param args usual printf style args
>>  */
>> -#define qb_logt(priority, tags, fmt, args...) do {  \
>> -struct qb_log_callsite* descriptor_pt = \
>> -qb_log_callsite_get(__func__, __FILE__, fmt,\
>> -priority, __LINE__, tags);  \
>> -qb_log_real_(descriptor_pt, ##args);\
>> +#define qb_logt(priority, tags_, fmt, args...) do { \
>> +static struct qb_log_callsite* descriptor_pt;   \
>> +if (!descriptor_pt) {   \
>> +descriptor_pt = \
>> +qb_log_callsite_get(__func__, __FILE__, fmt,\
>> +priority, __LINE__, tags_); \
>> +}   \
>> +if (descriptor_pt && (descriptor_pt->targets || \
>> +qb_bit_set(descriptor_pt->tags, \
>> +QB_LOG_TAG_LIBQB_MSG_BIT))) \
>> +qb_log_real_(descriptor_pt, ##args);\
>> } while(0)
>> 
>> ___
>> Developers mailing list
>> Developers@clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/developers
> -- 
> Ken Gaillot 
> 
> ___
> Developers mailing list
> Developers@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/developers

___
Developers mailing list
Developers@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Resurrecting OCF

2016-09-20 Thread Andrew Beekhof
I like where this is going.
Although I don’t think we want to get into the business of trying to script 
config changes from one agent to another, so I’d drop #4


I would make .deprecated a nested directory so that if we want to retire (for 
example) a ClusterLabs agent in the future we can create 
.deprecate/clusterlabs/ and put the agent there. Rather than make this 
heartbeat specific.

I wonder if some of this should live in pacemaker itself though…
If resources_action_create() cannot find ocf:${provider}:${agent} in its usual 
location, look up ${OCF_ROOT_DIR}/.compat/${provider}/__entries__

Format for __entries__:
   # old, replacement
   # ${agent} , ${new_provider}:${new_agent} , ${description}
   IPaddr , clusterlabs:IP , Replaced with different semantics
   IPaddr2 , clusterlabs:IP , Moved
   drbd , linbit:drbd , Moved
   eDirectory , , Deleted

Assuming an entry is found:
- If  . compat/${old_provider}/${old_agent} exists, notify the user “somehow”, 
then call it.
- Otherwise, return OCF_ERR_NOT_INSTALLED and use ${description} and 
${replacement} as the exit reason (which shows up in pcs status).

Perhaps the “somehow” is creating PCMK_OCF_DEPRECATED (with the same semantics 
as PCMK_OCF_DEGRADED) and prepending ${description} to the output (assuming its 
not a metadata op) and/or the exit reason[1].  Maybe only on successful start 
operations to minimise the noise?

[1] Shouldn’t be too hard with some extra fields for 'struct 
svc_action_private_s’ or svc_action_t


> On 19 Aug 2016, at 6:59 PM, Jan Pokorný  wrote:
> 
> On 18/08/16 17:27 +0200, Klaus Wenninger wrote:
>> On 08/18/2016 05:16 PM, Ken Gaillot wrote:
>>> On 08/18/2016 08:31 AM, Kristoffer Grönlund wrote:
 Jan Pokorný  writes:
 
> Thinking about that, ClusterLabs may be considered a brand established
> well enough for "clusterlabs" provider to work better than anything
> general such as previously proposed "core".  Also, it's not expected
> there will be more RA-centered projects under this umbrella than
> resource-agents (pacemaker deserves to be a provider on its own),
> so it would be pretty unambiguous pointer.
 I like this suggestion as well.
>>> Sounds good to me.
>>> 
> And for new, not well-tested agents within resource-agents, there could
> also be a provider schema akin to "clusterlabs-staging" introduced.
> 
> 1 CZK
 ...and this too.
>>> I'd rather not see this. If the RA gets promoted to "well-tested",
>>> everyone's configuration has to change. And there's never a clear line
>>> between "not well-tested" and "well-tested", so things wind up staying
>>> in "beta" status long after they're widely used in production, which
>>> unnecessarily makes people question their reliability.
>>> 
>>> If an RA is considered experimental, say so in the documentation
>>> (including the man page and help text), and give it an "0.x" version number.
>>> 
 Here is another one: While we are moving agents into a new namespace,
 perhaps it is time to clean up some of the legacy agents that are no
 longer recommended or of questionable quality? Off the top of my head,
 there are
 
 * heartbeat/Evmsd
 * heartbeat/EvmsSCC
 * heartbeat/LinuxSCSI
 * heartbeat/pingd
 * heartbeat/IPaddr
 * heartbeat/ManageRAID
 * heartbeat/vmware
 
 A pet peeve of mine would also be to move heartbeat/IPaddr2 to
 clusterlabs/IP, to finally get rid of that weird 2 in the name...
>>> +1!!! (or is it -2?)
>>> 
 Cheers,
 Kristoffer
>>> Obviously, we need to keep the ocf:heartbeat provider around for
>>> backward compatibility, for the extensive existing uses both in cluster
>>> configurations and in the zillions of how-to's scattered around the web.
>>> 
>>> Also, despite the recommendation of creating your own provider, many
>>> people drop custom RAs in the heartbeat directory.
>>> 
>>> The simplest approach would be to just symlink heartbeat to clusterlabs,
>>> but I think that's a bad idea. If a custom RA deployment or some package
>>> other than resource-agents puts an RA there, resource-agents will try to
>>> make it a symlink and the other package will try to make it a directory.
>>> Plus, people may have configuration management systems and/or file
>>> integrity systems that need it to be a directory.
>>> 
>>> So, I'd recommend we keep the heartbeat directory, and keep the old RAs
>>> you list above in it, move the rest of the RAs to the new clusterlabs
>>> directory, and symlink each one back to the heartbeat directory. At the
>>> same time, we can announce the heartbeat provider as deprecated, and
>>> after a very long time (when it's difficult to find references to it via
>>> google), we can drop it.
>> 
>> Maybe a way to go for the staging-RAs as well:
>> Have them in clusterlabs-staging and symlinked (during install
>> or package-generation) into clusterlabs ... while they are
>> cleanly 

Re: [ClusterLabs Developers] Potential logo for Cluster Labs

2016-08-26 Thread Andrew Beekhof

> On 26 Aug 2016, at 1:48 AM, Digimer  wrote:
> 
> On 25/08/16 11:35 AM, Kristoffer Grönlund wrote:
>> Digimer  writes:
>> 
>>> 
>>> My first reaction was "Meh, I like Ken's idea better". But they it
>>> started to sink in and I have to agree, I love it too. It's a brilliant
>>> concept that wonderfully plays on our group's name and purpose. I might
>>> like to refine the idea a touch, but I really love the cleverness and
>>> simplicity of the logo. This gets my +1.
>>> 
>> 
>> Wow, thank you as well! And I officially give you and anyone else who'd
>> like to permission to take the idea and run with it in any way you want,
>> tweak or redo or whatever. In the spirit of Linux turning 20 I donate
>> the beaker idea and image to the community ;)
> 
> My offer to have our designer work on the logo stands. If the community
> likes this idea, then as soon as I return from my vacation (Sep. 16th),
> I'll contact him and ask him to create a few variants on the design. As
> soon as he gets back to me, I'll pass them along to this list. He can
> work on the colour pallet as well, which could help with the next
> website design and provide a consistent visual theme.
> 
> Of course, in the same spirit, all work will be given to the Cluster
> Labs community with no strings attached. :)

Unless the strings were required to stop the logo from falling down of course

> 
> -- 
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> 
> ___
> Developers mailing list
> Developers@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/developers


___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] OCF under the Linux Foundation?

2016-08-15 Thread Andrew Beekhof
I’d just be cautious about giving them the keys to anything.
It took many months for them to bring their bugzilla instance back online and 
we lost many years of bug data.

Another consideration, between ourselves we can make decisions pretty 
constructively and quickly… is a layer of red tape (slow) over that going to be 
beneficial?

> On 16 Aug 2016, at 3:05 AM, Digimer  wrote:
> 
> My take at this point is this;
> 
> We're a small group, all things considered, and we're all working fairly
> well together at this time. We've all got our particular focus and, so
> far as I have seen (and I admit to not seeing a whole lot), coordinating
> between projects is going fine.
> 
> So my question is;
> 
> The overhead of a more official organization, charter, etc comes at a
> time cost. What do we get out of it? If the benefits outweigh the time
> costs, sure. Otherwise, I think we're fine staying under the Clusterlabs
> umbrella for the time being.
> 
> Again, I know my view of HA is hardly complete, so this is just my take
> on it.
> 
> digimer
> 
> On 15/08/16 12:34 PM, Ken Gaillot wrote:
>> I got a response from the Linux Foundation regarding the OCF name. They
>> are willing to host a working group if we want a neutral home for it. He
>> didn't explicitly address the question, but I believe the LF would have
>> no objections to ClusterLabs taking over the OCF name if we don't want
>> to go that route.
>> 
>> I've attached the sample charter he mentioned in case anyone wants to
>> see it. We wouldn't have to set up identically but it's a reference point.
>> 
>> I think the naming conflicts he mentions are not serious, because (1)
>> our usage predates either of those, and (2) there are even more existing
>> computer-related uses of the OCF acronym (see
>> https://en.wikipedia.org/wiki/OCF ... Open Computing Facility,
>> OpenBSD/FreeBSD Cryptographic Framework, OpenCard Framework, Original
>> Composite Font).
>> 
>> How does everyone feel about this? Should we host the OCF standards
>> under the Linux Foundation, for greater reach and authority, and clear
>> neutrality? Or should we bring it under ClusterLabs, to keep everything
>> as simple as possible (and perhaps emphasize support for OSes beyond Linux)?
>> 
>>  Forwarded Message 
>> Subject: Re: Open Cluster Framework
>> Date:Mon, 15 Aug 2016 12:12:46 -0400
>> From:Michael Dolan 
>> To:  kgail...@redhat.com
>> CC:  Mike Woster 
>> 
>> 
>> 
>> Hi Ken, is this something you would prefer to have at the LF? We could
>> setup a lightweight governance model and let the community drive all the
>> decisions under a working group model under the LF. We just announced a
>> similar structure for Open vSwitch and would be amenable to hosting this
>> similarly. I'm pasting the governance documents here so you can see what
>> that looked like. They didn't want any membership levels or fees so it's
>> just a technical collaboration effort and very lightweight. However
>> giving it a home at the LF allowed them to neutralize any arguments the
>> project was under the control of any one company. They assigned the
>> domain and trademark rights to the LF to make it neutral.
>> 
>> I will point out there are a few "OCF" standards out there now that are
>> already in naming conflict. First there was the "Open Container Format"
>> or "OCF Certified" by the Open Container Initiative we host. They
>> already filed for a registered trademark. They standardized the Docker
>> container format for broader industry use.
>> 
>> The other is the Open Connectivity Foundation which is a standards body.
>> That one is not directly affiliated with the LF, but we host the
>> IoTivity open source project they sponsor so we're aware of their
>> activities. They have an OCF brand I believe they were planning to use
>> for IoT devices that implement their specification standard.
>> 
>> I'd be happy to jump on a call if it would be easier to discuss live.
>> Thanks,
>> 
>> Mike
>> 
>> 
>> 
>> ---
>> Mike Dolan
>> VP of Strategic Programs
>> The Linux Foundation
>> Office: +1.330.460.3250   Cell: +1.440.552.5322  Skype: michaelkdolan
>> mdo...@linuxfoundation.org 
>> ---
>> 
>> 
>> 
>> ___
>> Developers mailing list
>> Developers@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/developers
>> 
> 
> 
> -- 
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> 
> ___
> Developers mailing list
> Developers@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/developers


___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

2016-07-30 Thread Andrew Beekhof


Sent from my iPhone

> On 30 Jul 2016, at 8:32 AM, Ken Gaillot  wrote:
> 
> I finally had time to investigate this, and it definitely is broken.
> 
> The only existing heartbeat RA to use the *_notify_active_* variables is
> Filesystem, and it only does so for OCFS2 on SLES10, which didn't even
> ship pacemaker,

I'm pretty sure it did

> so I'm guessing it's been broken from the beginning of
> pacemaker.
> 
> The fix looks straightforward, so I should be able to take care of it soon.
> 
> Filed bug http://bugs.clusterlabs.org/show_bug.cgi?id=5295
> 
>> On 05/08/2016 04:57 AM, Jehan-Guillaume de Rorthais wrote:
>> Le Fri, 6 May 2016 15:41:11 -0500,
>> Ken Gaillot  a écrit :
>> 
 On 05/03/2016 05:30 PM, Jehan-Guillaume de Rorthais wrote:
 Le Tue, 3 May 2016 21:10:12 +0200,
 Jehan-Guillaume de Rorthais  a écrit :
 
> Le Mon, 2 May 2016 17:59:55 -0500,
> Ken Gaillot  a écrit :
> 
>>> On 04/28/2016 04:47 AM, Jehan-Guillaume de Rorthais wrote:
>>> Hello all,
>>> 
>>> While testing and experiencing with our RA for PostgreSQL, I found the
>>> meta_notify_active_* variables seems always empty. Here is an example of
>>> these variables as they are seen from our RA during a
>>> migration/switchover:
>>> 
>>> 
>>>  {
>>>'type' => 'pre',
>>>'operation' => 'demote',
>>>'active' => [],
>>>'inactive' => [],
>>>'start' => [],
>>>'stop' => [],
>>>'demote' => [
>>>  {
>>>'rsc' => 'pgsqld:1',
>>>'uname' => 'hanode1'
>>>  }
>>>],
>>> 
>>>'master' => [
>>>  {
>>>'rsc' => 'pgsqld:1',
>>>'uname' => 'hanode1'
>>>  }
>>>],
>>> 
>>>'promote' => [
>>>   {
>>> 'rsc' => 'pgsqld:0',
>>> 'uname' => 'hanode3'
>>>   }
>>> ],
>>>'slave' => [
>>> {
>>>   'rsc' => 'pgsqld:0',
>>>   'uname' => 'hanode3'
>>> },
>>> {
>>>   'rsc' => 'pgsqld:2',
>>>   'uname' => 'hanode2'
>>> }
>>>   ],
>>> 
>>>  }
>>> 
>>> In case this comes from our side, here is code building this:
>>> 
>>>  
>>> https://github.com/dalibo/PAF/blob/6e86284bc647ef1e81f01f047f1862e40ba62906/lib/OCF_Functions.pm#L444
>>> 
>>> But looking at the variable itself in debug logs, I always find it 
>>> empty,
>>> in various situations (switchover, recover, failover).
>>> 
>>> If I understand the documentation correctly, I would expect 'active' to
>>> list all the three resources, shouldn't it? Currently, to bypass this, 
>>> we
>>> consider: active == master + slave
>> 
>> You're right, it should. The pacemaker code that generates the "active"
>> variables is the same used for "demote" etc., so it seems unlikely the
>> issue is on pacemaker's side. Especially since your code treats active
>> etc. differently from demote etc., it seems like it must be in there
>> somewhere, but I don't see where.
> 
> The code treat active, inactive, start and stop all together, for any
> cloned resource. If the resource is a multistate, it adds promote, demote,
> slave and master.
> 
> Note that from this piece of code, the 7 other notify vars are set
> correctly: start, stop, inactive, promote, demote, slave, master. Only
> active is always missing.
> 
> I'll investigate and try to find where is hiding the bug.
 
 So I added a piece of code to dump the **all** the environment variables to
 a temp file as early as possible **to avoid any interaction with our perl
 module** in the code of the RA, ie.:
 
  BEGIN {
use Time::HiRes qw(time);
my $now = time;
open my $fh, ">", "/tmp/test-$now.env.txt";
printf($fh "%-20s = ''%s''\n", $_, $ENV{$_}) foreach sort keys %ENV;
  }
 
 Then I started my cluster and set maintenance-mode=false while no resources
 where running. So the debug files contains the probe action, start on all
 nodes, one promote on the master and the first monitors. The "*active"
 variables are always empty anywhere in the cluster. Find in attachment the
 result of the following command on the master node:
 
  for i in test-*; do echo "= $i ="; grep OCF_ $i; done >
 debug-env.txt
 
 I'm using Pacemaker 1.1.13-10.el7_2.2-44eb2dd under CentOS 7.2.1511.
 
 For completeness, I added the Pacemaker configuration I use for my 3 

Re: [ClusterLabs Developers] OCF_RESKEY_CRM_meta_notify_active_* always empty

2016-07-30 Thread Andrew Beekhof
Urgh. I must be confused with sles11. 
In any case, the first version of pacemaker was identical to the last heartbeat 
crm. 

I don't recall the ocfs2 agent changing design while I was there, so 11 may be 
broken too

Sent from my iPhone

> On 30 Jul 2016, at 8:51 AM, Ken Gaillot <kgail...@redhat.com> wrote:
> 
>> On 07/29/2016 05:41 PM, Andrew Beekhof wrote:
>> 
>> 
>> Sent from my iPhone
>> 
>>> On 30 Jul 2016, at 8:32 AM, Ken Gaillot <kgail...@redhat.com> wrote:
>>> 
>>> I finally had time to investigate this, and it definitely is broken.
>>> 
>>> The only existing heartbeat RA to use the *_notify_active_* variables is
>>> Filesystem, and it only does so for OCFS2 on SLES10, which didn't even
>>> ship pacemaker,
>> 
>> I'm pretty sure it did
> 
> All I could find was:
> 
> "SLES 10 did not yet ship pacemaker, but heartbeat with the builtin crm"
> 
> http://oss.clusterlabs.org/pipermail/pacemaker/2014-July/022232.html
> 
> I'm sure people were compiling it, and ClusterLabs probably even
> provided a repo, but it looks like sles didn't ship it.
> 
> The issue is that the code that builds the active list checks for role
> RSC_ROLE_STARTED rather than RSC_ROLE_SLAVE + RSC_ROLE_MASTER, so I
> don't think it ever would have worked.
> 
>> 
>>> so I'm guessing it's been broken from the beginning of
>>> pacemaker.
>>> 
>>> The fix looks straightforward, so I should be able to take care of it soon.
>>> 
>>> Filed bug http://bugs.clusterlabs.org/show_bug.cgi?id=5295
>>> 
>>>> On 05/08/2016 04:57 AM, Jehan-Guillaume de Rorthais wrote:
>>>> Le Fri, 6 May 2016 15:41:11 -0500,
>>>> Ken Gaillot <kgail...@redhat.com> a écrit :
>>>> 
>>>>>> On 05/03/2016 05:30 PM, Jehan-Guillaume de Rorthais wrote:
>>>>>> Le Tue, 3 May 2016 21:10:12 +0200,
>>>>>> Jehan-Guillaume de Rorthais <j...@dalibo.com> a écrit :
>>>>>> 
>>>>>>> Le Mon, 2 May 2016 17:59:55 -0500,
>>>>>>> Ken Gaillot <kgail...@redhat.com> a écrit :
>>>>>>> 
>>>>>>>>> On 04/28/2016 04:47 AM, Jehan-Guillaume de Rorthais wrote:
>>>>>>>>> Hello all,
>>>>>>>>> 
>>>>>>>>> While testing and experiencing with our RA for PostgreSQL, I found the
>>>>>>>>> meta_notify_active_* variables seems always empty. Here is an example 
>>>>>>>>> of
>>>>>>>>> these variables as they are seen from our RA during a
>>>>>>>>> migration/switchover:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> {
>>>>>>>>>   'type' => 'pre',
>>>>>>>>>   'operation' => 'demote',
>>>>>>>>>   'active' => [],
>>>>>>>>>   'inactive' => [],
>>>>>>>>>   'start' => [],
>>>>>>>>>   'stop' => [],
>>>>>>>>>   'demote' => [
>>>>>>>>> {
>>>>>>>>>   'rsc' => 'pgsqld:1',
>>>>>>>>>   'uname' => 'hanode1'
>>>>>>>>> }
>>>>>>>>>   ],
>>>>>>>>> 
>>>>>>>>>   'master' => [
>>>>>>>>> {
>>>>>>>>>   'rsc' => 'pgsqld:1',
>>>>>>>>>   'uname' => 'hanode1'
>>>>>>>>> }
>>>>>>>>>   ],
>>>>>>>>> 
>>>>>>>>>   'promote' => [
>>>>>>>>>  {
>>>>>>>>>'rsc' => 'pgsqld:0',
>>>>>>>>>'uname' => 'hanode3'
>>>>>>>>>  }
>>>>>>>>>],
>>>>>>>>>   'slave' => [
>>>>>>>>>{
>>>>>>>>>  'rsc' => 'pgsqld:0',
>>>>>>>>>  'uname' => 'hanode3'
>>>>>>>>>},
>>>>>>&g

Re: [ClusterLabs Developers] Proposed future feature: multiple notification scripts

2015-12-06 Thread Andrew Beekhof

> On 5 Dec 2015, at 4:22 AM, Ken Gaillot  wrote:
> 
> On 12/04/2015 10:59 AM, Jan Pokorný wrote:
>> Btw. was any other architectural approach considered?  It's sad
>> that multiplatform IPC, which is what might be better to handle
>> more or less continuous one-way stream of updates

but requires us to build in knowledge of, and a library dependancy on, every 
possible target.
we’ve already seen how well that went for just SNMP and SMTP, let alone 
whatever current management fad has captured people’s attention.

by contrast, there is always a CLI tool 

>> (would the exec
>> mechanism apply some kind of rate-limiting to prevent exhaustion?),
>> but what about
> 
> There is no rate limiting on the Pacemaker end. If there winds up
> being a big demand for it, we can look into it, but that is more
> likely to be useful within the script itself (a script that notifies
> another service of status changes likely does not want rate limiting,
> but an SMS notifier sure might).

agreed. this belongs in the scripts or some intermediate party.

also, by definition, the notification overhead is less than that of the 
resources they are for.
so the node itself should be ok, the person on the other end… that is why 
people install alert management systems.

> 
>> long-lived unix sockets or named pipes?  Especially the latter
>> might be doable in the agent just in shell + coreutils and other
>> basic tooling and might play well with file discovery approach.
> 
> Simple shell scripts are often the limit of sysadmins' abilities in
> this area, so the less they have to know/do the better.
> 
> ___
> Developers mailing list
> Developers@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/developers


___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] Proposed future feature: multiple notification scripts

2015-12-02 Thread Andrew Beekhof

> On 3 Dec 2015, at 10:23 AM, Ken Gaillot <kgail...@redhat.com> wrote:
> 
> This will be of interest to cluster front-end developers and anyone who
> needs event notifications ...
> 
> One of the new features in Pacemaker 1.1.14 will be built-in
> notifications of cluster events, as described by Andrew Beekhof on That
> Cluster Guy blog:
> http://blog.clusterlabs.org/blog/2015/reliable-notifications/
> 
> For a future version, we're considering extending that to allow multiple
> notification scripts, each with multiple recipients. This would require
> a significant change in the CIB. Instead of a simple cluster property,
> our current idea is a new configuration section in the CIB, probably
> along these lines:
> 
> 
>   
> 
>   
>   
> 
>  
>  
> 
> 
> 
> 
>  
> 
>  
> 
>   
> 
> 
> 
> The recipient values would be passed to the script as command-line
> arguments (ex. "/my/script.sh m...@example.com").
> 
> For backward compatibility, the (brand new!) notification-agent and
> notification-recipient cluster properties would be kept as deprecated
> shortcuts for a single notify script and recipient.

Actually, that didn't make it into an upstream release.
So we could just pretend it never happened :)

Sure its in RHEL but we haven’t advertised it yet and it can be our problem to 
do backwards compatibility for - no need to inflict that on upstream.

> 
> Also for backward compatibility, the first recipient would be passed to
> the script as the CRM_notify_recipient environment variable.
> 
> This proposal came about because the new notification capability has
> turned out to be useful enough that people sometimes want to use it for
> multiple purposes, e.g. email an administrator, and notify some software
> that an event occurred. Trying to fit unrelated actions in one
> notification script (or a script that calls multiple other scripts) has
> obvious pitfalls, so this would make it easier on sysadmins.
> 
> Another advantage will be a configurable timeout (1.1.14 will have a
> hardcoded 5-minute timeout for notification scripts).
> 
> The crm_attribute command and the various cluster front-ends would need
> to be modified to handle the new configuration syntax.
> 
> This is all in the idea stage (development is still a ways off), so any
> comments, suggestions, criticisms, etc. are welcome.
> -- 
> Ken Gaillot <kgail...@redhat.com>
> 
> ___
> Developers mailing list
> Developers@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/developers


___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] migrate-to and migrate-from for moving Master/Slave roles ?

2015-12-01 Thread Andrew Beekhof

> On 26 Nov 2015, at 11:52 AM, Jehan-Guillaume de Rorthais  
> wrote:
> 
> Hi guys,
> 
> While working on our pgsqlms agent[1], we are now studying how to control all
> the steps of a switchover process from the resource agent. 
> 
> The tricky part here is the 2nd step of a successful swithover with PostgreSQL
> (9.3+):
>  (1) shutdown the master first
>  (2) make sure the designated slave received **everything** from the old 
> master

How can you achieve (2) if (1) has already occurred?
There’s no-one for the designated slave to talk to in the case of errors...

>  (3) promote the designated slave as master
>  (4) start the old master as slave

(4) is pretty tricky.  Assuming you use master/slave, its supposed to be in 
this state already after the demote in step (1).
If you’re just using clones, then you’re in even more trouble because pacemaker 
either wouldn’t have stopped it or won’t want to start it again.

See more below.

> As far as we understand Pacemaker, migrate-to and migrate-from capabilities
> allows to distinguish if we are moving a resource because of a failure or for 
> a
> controlled switchover situation. Unfortunately, these capabilities are ignored
> for cloned and multi-state resources…

Yeah, this isn’t really the right use-case.
You need to be looking more at the promote/demote cycle.

If you turn on notifications, then in a graceful switchover (eg. the node is 
going into standby) you will get information about which node has been selected 
to become the new master when calling demote on the old master.  Perhaps you 
could ensure (2) while performing (1).

Its not ideal, but you could have (4) happen in the post-promote notification.
Notify actions aren’t /supposed/ to change resource state but it has been done 
before.

> 
> Because of this restriction, we currently don't know from the resource agent
> code if we should check the designated slave received everything from the old
> master (controlled switchover) or not (we lost the master). In case of
> controlled switchover, if the designated slave did not received everything 
> from
> the master, we must abort the switchover.
> 
> A workaround we could imagine would be to set a special cluster attribute
> manually (using crm_attribute) to signal the agent we are going to make a
> controlled switchover.
> 
> But I bet the cleaner way would be to use migrate-to and migrate-from
> capabilities. Did we miss something about them? Is there some plan to support
> moving a Master/Slave role using migrate-to and migrate-from at some point? 
> Any
> other proposal? ideas?
> 
> [1] see "multistate" folder in https://github.com/dalibo/pgsql-resource-agent
> 
> Regards,
> -- 
> Jehan-Guillaume de Rorthais
> Dalibo
> 
> ___
> Developers mailing list
> Developers@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/developers


___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers


Re: [ClusterLabs Developers] --verbose breaks stonithd + some fencing agents

2015-10-14 Thread Andrew Beekhof

> On 15 Oct 2015, at 12:10 PM, Adam Spiers  wrote:
> 
> Hi all,
> 
> I'm certainly no expert on stonithd or the way it interfaces with
> fence-agents, but I think I found a bug in either stonithd or
> fencing.py.
> 
> If a fencing agent is invoked with the --verbose CLI argument (or
> something like 'verbose=1' via STDIN), then any invocations of
> logging.debug() will cause output to STDERR:
> 
>  
> https://github.com/ClusterLabs/fence-agents/blob/master/fence/agents/lib/fencing.py.py#L640
> 
> This confuses stonithd, because it dup(2)s STDOUT and STDERR to the
> same fd which is the writeable end of a pipe used by stonithd to read
> output from the forked child which runs the fencing agent:
> 
>  
> https://github.com/ClusterLabs/pacemaker/blob/master/lib/fencing/st_client.c#L782
> 
> Therefore from the point of view of stonithd, debug output on STDERR
> gets intermingled with "real" output on STDOUT,

That seems like a mistake.
We shouldn’t be duping them to the same descriptor - it should be like the lrmd 
which captures stderr but only logs it if there is an error (we can’t 
necessarily assume agents are already logging to syslog).

Some trawling through git suggests that we inherited this from libfence back in 
2009.
Time to modernise it.

> and when it comes to
> parse this, the result is warnings in the logs beginning:
> 
>  stonith-ng[5399]:  warning: Could not parse ...
> 
> Since we already log to syslog, I wonder if it's not needed to also
> log to STDERR, so my first instinct was this fix:
> 
>  
> https://github.com/aspiers/fence-agents/commit/4c7148b8046eb9cef950811c26fab73672f403bc
> 
> However, subsequently I realised that the root cause is the way
> stonithd mixes STDOUT and STDERR from the child fencing agent process
> together, so now I'm wondering if it would be better to change
> lib/fencing/st_client.c to create a third pipe for handling STDERR
> independently.
> 
> Thoughts?

Exactly :)


___
Developers mailing list
Developers@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/developers