On Tue, 3 Apr 2018 09:58:50 +1000
Andrew Beekhof <abeek...@redhat.com> wrote:
> On Fri, Mar 30, 2018 at 8:36 PM, Jehan-Guillaume de Rorthais <
> j...@dalibo.com> wrote:
> > On Thu, 29 Mar 2018 09:32:41 +1100
> > Andrew Beekhof <abeek...@redhat.com> wrote:
> > > On Thu, Mar 29, 2018 at 8:07 AM, Jehan-Guillaume de Rorthais <
> > > j...@dalibo.com> wrote:
[...]
> > > Though by now there is surely a decent library for logging to files with
> > > sub-second timestamps - if we could incorporate that into libqb and have
> > > corosync use it too...
> >
> > In my opinion, this is neither the role of libqb
>
>
> libqb has the logging library that pacemaker and corosync use.
> it is absolutely where this change should happen
I meant that this could be handled 100% by some external dedicated daemon, eg.
syslog or journalctl.
I was thinking about code simplification.
[...]
> > > then we could consider 1 log per daemon.
> > > In which case, the outcome of the PREFIX-SUFFIX discussion above could
> > > instead be used for /var/log/pacemaker/SUFFIX
> >
> > I think the best would be to have one log for Corosync, one log for
> > Pacemaker (and all its sub-process/childs).
> >
> > Another good path toward understandable log file would be to hide what
> > process is speaking. Experienced user will still know that "LOG: setting
> > failcount to 3" comes from CRMd and "DEBUG1: failcount setted to 3" comes
> > from attrd.
> >
> > However, this would probably be a mess...because again, the cause might be
> > logged AFTER the effects/reaction :/
>
> why? i've never seen that be the case
Please find in attachment a demonstration of such behavior I found last week.
Note that this comes from a Sles 12 SP1 using Pacemaker 1.1.13...People there
were not able to upgrade the servers before we built the PoC together.
First column is the order in the log file. Second column is how I would expect
the messages to appear in the log.
Eg. I would expect L.11
"pengine: notice: process_pe_message: Calculated Transition 29: [...]"
Before CRMd begin to process it at L.6-10.
Another exemple, I would expect LRMd L.35:
"lrmd: notice: log_finished: finished - rsc:pgsqld action:notify"
Before the CRMd receive the result L.26...
Maybe this is something fixed in 1.1.18 or 2.0.0, I just couldn't find commit
messages related to this when searching through them quickly.
> > Maybe the solution is to log only messages from CRMD, where all the
> > orchestration comes from. Everything else might go to some debug level if
> > needed.
>
> sorry, that is a terrible idea
I was throwing random ideas as I'm not familiar with internal architecture.
Maybe it should be pacemakerd to gather messages from all other messages and
spit them to stderr so they are captured by journald or redirected to a file...
Regards,
1 1 14:15:50 slesHAtest1 pengine: info: LogActions: Leave fence_sleshatest2 (Started slesHAtest1)
2 2 14:15:50 slesHAtest1 pengine: notice: LogActions: Stop admin_addr (slesHAtest2)
3 3 14:15:50 slesHAtest1 pengine: info: LogActions: Leave fence_sleshatest1 (Started slesHAtest2)
4 4 14:15:50 slesHAtest1 pengine: notice: LogActions: Demote pgsqld:0 (Master -> Stopped slesHAtest2)
5 5 14:15:50 slesHAtest1 pengine: notice: LogActions: Stop pgsqld:1 (slesHAtest1)
6 8 14:15:50 slesHAtest1 crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
7 9 14:15:50 slesHAtest1 crmd: notice: do_te_invoke: Processing graph 29 (ref=pe_calc-dc-1522239350-121) derived from /var/lib/pacemaker/pengine/pe-input-105.bz2
8 11 14:15:50 slesHAtest1 crmd: notice: te_rsc_command: Initiating action 46: notify pgsqld_pre_notify_demote_0 on slesHAtest2
9 12 14:15:50 slesHAtest1 crmd: notice: te_rsc_command: Initiating action 48: notify pgsqld_pre_notify_demote_0 on slesHAtest1 (local)
10 10 14:15:50 slesHAtest1 crmd: info: do_lrm_rsc_op: Performing key=48:29:0:99751e86-20f1-4ee7-83bf-6428b030ebc9 op=pgsqld_notify_0
11 6 14:15:50 slesHAtest1 pengine: notice: process_pe_message: Calculated Transition 29: /var/lib/pacemaker/pengine/pe-input-105.bz2
12 7 14:15:50 slesHAtest1 cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/crmd/161)
13 13 14:15:50 slesHAtest1 lrmd: notice: log_execute: executing - rsc:pgsqld action:notify call_id:59
14 14 14:15:50 slesHAtest1 cib: info: cib_perform_op: Diff: --- 0.88.0 2
15 15 14:15:50 slesHAtest1 cib: info: cib_perform_op: Diff: +++ 0.88.1 (null)
16 16 14:15:50 slesHAtest1 cib: info: cib_perform_op: + /cib: @num_updates=1
17 17 14:15:50 slesHAtest1 cib: info: cib_perform_op: + /cib/status/node_state[@id='168692358']/lrm[@id='168692358']/lrm_resources/lrm_resource[@id='pgsqld']/lrm_rsc_op[@id='pgsqld_pre_notify_demote_0']: @transition-key=46:29:0:99751e86-20f1-4ee7-83bf-6428b030ebc9, @transition-magic=0:0;46:29:0:99751e86-20f1-4ee7-83bf-6428b030ebc9, @last-run=1522239350, @last-rc-change=1522239350
18 18 14:15:50 slesHAtest1 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=slesHAtest2/crmd/65, version=0.88.1)
19 25 14:15:50 slesHAtest1 crmd: info: match_graph_event: Action pgsqld_notify_0 (46) confirmed on slesHAtest2 (rc=0)
20 19 14:15:50 slesHAtest1 cib: info: cib_file_backup: Archived previous version as /var/lib/pacemaker/cib/cib-41.raw
21 20 14:15:50 slesHAtest1 cib: info: cib_perform_op: Diff: --- 0.88.1 2
22 21 14:15:50 slesHAtest1 cib: info: cib_perform_op: Diff: +++ 0.88.2 (null)
23 22 14:15:50 slesHAtest1 cib: info: cib_perform_op: + /cib: @num_updates=2
24 23 14:15:50 slesHAtest1 cib: info: cib_perform_op: + /cib/status/node_state[@id='168692312']/lrm[@id='168692312']/lrm_resources/lrm_resource[@id='pgsqld']/lrm_rsc_op[@id='pgsqld_pre_notify_demote_0']: @transition-key=48:29:0:99751e86-20f1-4ee7-83bf-6428b030ebc9, @transition-magic=0:0;48:29:0:99751e86-20f1-4ee7-83bf-6428b030ebc9, @last-run=1522239350, @last-rc-change=1522239350
25 24 14:15:50 slesHAtest1 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=slesHAtest1/crmd/161, version=0.88.2)
26 27 14:15:50 slesHAtest1 crmd: info: match_graph_event: Action pgsqld_notify_0 (48) confirmed on slesHAtest1 (rc=0)
27 28 14:15:50 slesHAtest1 crmd: notice: te_rsc_command: Initiating action 12: demote pgsqld_demote_0 on slesHAtest2
28 29 14:15:50 slesHAtest1 cib: info: cib_file_write_with_digest: Wrote version 0.88.0 of the CIB to disk (digest: f367ff8bc5683befa7959044bed1a83e)
29 30 14:15:50 slesHAtest1 cib: info: cib_perform_op: Diff: --- 0.88.2 2
30 31 14:15:50 slesHAtest1 cib: info: cib_perform_op: Diff: +++ 0.88.3 (null)
31 32 14:15:50 slesHAtest1 cib: info: cib_perform_op: + /cib: @num_updates=3
32 33 14:15:50 slesHAtest1 cib: info: cib_perform_op: + /cib/status/node_state[@id='168692358']/lrm[@id='168692358']/lrm_resources/lrm_resource[@id='pgsqld']/lrm_rsc_op[@id='pgsqld_last_0']: @operation_key=pgsqld_demote_0, @operation=demote, @transition-key=12:29:0:99751e86-20f1-4ee7-83bf-6428b030ebc9, @transition-magic=-1:193;12:29:0:99751e86-20f1-4ee7-83bf-6428b030ebc9, @call-id=-1, @rc-code=193, @op-status=-1, @last-run=1522239350, @last-rc-change=1522239350, @exec-time=0, @queue-ti
33 34 14:15:50 slesHAtest1 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=slesHAtest2/crmd/66, version=0.88.3)
34 35 14:15:50 slesHAtest1 cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.R6jUCM (digest: /var/lib/pacemaker/cib/cib.CknL33)
35 26 14:15:50 slesHAtest1 lrmd: notice: log_finished: finished - rsc:pgsqld action:notify call_id:59 pid:24149 exit-code:0 exec-time:174ms queue-time:0ms
_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org