On Tue, 3 Apr 2018 09:58:50 +1000
Andrew Beekhof <abeek...@redhat.com> wrote:
> On Fri, Mar 30, 2018 at 8:36 PM, Jehan-Guillaume de Rorthais <
> j...@dalibo.com> wrote:  
> > On Thu, 29 Mar 2018 09:32:41 +1100
> > Andrew Beekhof <abeek...@redhat.com> wrote:  
> > > On Thu, Mar 29, 2018 at 8:07 AM, Jehan-Guillaume de Rorthais <  
> > > j...@dalibo.com> wrote:  
[...]
> > > Though by now there is surely a decent library for logging to files with
> > > sub-second timestamps - if we could incorporate that into libqb and have
> > > corosync use it too...  
> >
> > In my opinion, this is neither the role of libqb  
> 
> 
> libqb has the logging library that pacemaker and corosync use.
> it is absolutely where this change should happen

I meant that this could be handled 100% by some external dedicated daemon, eg.
syslog or journalctl.

I was thinking about code simplification.

[...]

> > > then we could consider 1 log per daemon.
> > > In which case, the outcome of the PREFIX-SUFFIX discussion above could
> > > instead be used for /var/log/pacemaker/SUFFIX  
> >
> > I think the best would be to have one log for Corosync, one log for
> > Pacemaker (and all its sub-process/childs).
> >
> > Another good path toward understandable log file would be to hide what
> > process is speaking. Experienced user will still know that "LOG: setting
> > failcount to 3" comes from CRMd and "DEBUG1: failcount setted to 3" comes
> > from attrd.
> >
> > However, this would probably be a mess...because again, the cause might be
> > logged AFTER the effects/reaction :/
> 
> why?  i've never seen that be the case

Please find in attachment a demonstration of such behavior I found last week.
Note that this comes from a Sles 12 SP1 using Pacemaker 1.1.13...People there
were not able to upgrade the servers before we built the PoC together.

First column is the order in the log file. Second column is how I would expect
the messages to appear in the log.

Eg. I would expect L.11

  "pengine: notice: process_pe_message: Calculated Transition 29: [...]"

Before CRMd begin to process it at L.6-10.

Another exemple, I would expect LRMd L.35:

  "lrmd:  notice: log_finished:  finished - rsc:pgsqld action:notify"

Before the CRMd receive the result L.26...

Maybe this is something fixed in 1.1.18 or 2.0.0, I just couldn't find commit
messages related to this when searching through them quickly.

> > Maybe the solution is to log only messages from CRMD, where all the
> > orchestration comes from. Everything else might go to some debug level if
> > needed.
> 
> sorry, that is a terrible idea

I was throwing random ideas as I'm not familiar with internal architecture.
Maybe it should be pacemakerd to gather messages from all other messages and
spit them to stderr so they are captured by journald or redirected to a file...

Regards,
1   1   14:15:50 slesHAtest1 pengine:    info: LogActions:    Leave   fence_sleshatest2       (Started slesHAtest1)
2   2   14:15:50 slesHAtest1 pengine:  notice: LogActions:    Stop    admin_addr      (slesHAtest2)
3   3   14:15:50 slesHAtest1 pengine:    info: LogActions:    Leave   fence_sleshatest1       (Started slesHAtest2)
4   4   14:15:50 slesHAtest1 pengine:  notice: LogActions:    Demote  pgsqld:0        (Master -> Stopped slesHAtest2)
5   5   14:15:50 slesHAtest1 pengine:  notice: LogActions:    Stop    pgsqld:1        (slesHAtest1)
6   8   14:15:50 slesHAtest1    crmd:    info: do_state_transition:   State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
7   9   14:15:50 slesHAtest1    crmd:  notice: do_te_invoke:  Processing graph 29 (ref=pe_calc-dc-1522239350-121) derived from /var/lib/pacemaker/pengine/pe-input-105.bz2
8   11  14:15:50 slesHAtest1    crmd:  notice: te_rsc_command:        Initiating action 46: notify pgsqld_pre_notify_demote_0 on slesHAtest2
9   12  14:15:50 slesHAtest1    crmd:  notice: te_rsc_command:        Initiating action 48: notify pgsqld_pre_notify_demote_0 on slesHAtest1 (local)
10  10  14:15:50 slesHAtest1    crmd:    info: do_lrm_rsc_op: Performing key=48:29:0:99751e86-20f1-4ee7-83bf-6428b030ebc9 op=pgsqld_notify_0
11  6   14:15:50 slesHAtest1 pengine:  notice: process_pe_message:    Calculated Transition 29: /var/lib/pacemaker/pengine/pe-input-105.bz2
12  7   14:15:50 slesHAtest1     cib:    info: cib_process_request:   Forwarding cib_modify operation for section status to master (origin=local/crmd/161)
13  13  14:15:50 slesHAtest1    lrmd:  notice: log_execute:   executing - rsc:pgsqld action:notify call_id:59
14  14  14:15:50 slesHAtest1     cib:    info: cib_perform_op:        Diff: --- 0.88.0 2
15  15  14:15:50 slesHAtest1     cib:    info: cib_perform_op:        Diff: +++ 0.88.1 (null)
16  16  14:15:50 slesHAtest1     cib:    info: cib_perform_op:        +  /cib:  @num_updates=1
17  17  14:15:50 slesHAtest1     cib:    info: cib_perform_op:        +  /cib/status/node_state[@id='168692358']/lrm[@id='168692358']/lrm_resources/lrm_resource[@id='pgsqld']/lrm_rsc_op[@id='pgsqld_pre_notify_demote_0']:  @transition-key=46:29:0:99751e86-20f1-4ee7-83bf-6428b030ebc9, @transition-magic=0:0;46:29:0:99751e86-20f1-4ee7-83bf-6428b030ebc9, @last-run=1522239350, @last-rc-change=1522239350
18  18  14:15:50 slesHAtest1     cib:    info: cib_process_request:   Completed cib_modify operation for section status: OK (rc=0, origin=slesHAtest2/crmd/65, version=0.88.1)
19  25  14:15:50 slesHAtest1    crmd:    info: match_graph_event:     Action pgsqld_notify_0 (46) confirmed on slesHAtest2 (rc=0)
20  19  14:15:50 slesHAtest1     cib:    info: cib_file_backup:       Archived previous version as /var/lib/pacemaker/cib/cib-41.raw
21  20  14:15:50 slesHAtest1     cib:    info: cib_perform_op:        Diff: --- 0.88.1 2
22  21  14:15:50 slesHAtest1     cib:    info: cib_perform_op:        Diff: +++ 0.88.2 (null)
23  22  14:15:50 slesHAtest1     cib:    info: cib_perform_op:        +  /cib:  @num_updates=2
24  23  14:15:50 slesHAtest1     cib:    info: cib_perform_op:        +  /cib/status/node_state[@id='168692312']/lrm[@id='168692312']/lrm_resources/lrm_resource[@id='pgsqld']/lrm_rsc_op[@id='pgsqld_pre_notify_demote_0']:  @transition-key=48:29:0:99751e86-20f1-4ee7-83bf-6428b030ebc9, @transition-magic=0:0;48:29:0:99751e86-20f1-4ee7-83bf-6428b030ebc9, @last-run=1522239350, @last-rc-change=1522239350
25  24  14:15:50 slesHAtest1     cib:    info: cib_process_request:   Completed cib_modify operation for section status: OK (rc=0, origin=slesHAtest1/crmd/161, version=0.88.2)
26  27  14:15:50 slesHAtest1    crmd:    info: match_graph_event:     Action pgsqld_notify_0 (48) confirmed on slesHAtest1 (rc=0)
27  28  14:15:50 slesHAtest1    crmd:  notice: te_rsc_command:        Initiating action 12: demote pgsqld_demote_0 on slesHAtest2
28  29  14:15:50 slesHAtest1     cib:    info: cib_file_write_with_digest:    Wrote version 0.88.0 of the CIB to disk (digest: f367ff8bc5683befa7959044bed1a83e)
29  30  14:15:50 slesHAtest1     cib:    info: cib_perform_op:        Diff: --- 0.88.2 2
30  31  14:15:50 slesHAtest1     cib:    info: cib_perform_op:        Diff: +++ 0.88.3 (null)
31  32  14:15:50 slesHAtest1     cib:    info: cib_perform_op:        +  /cib:  @num_updates=3
32  33  14:15:50 slesHAtest1     cib:    info: cib_perform_op:        +  /cib/status/node_state[@id='168692358']/lrm[@id='168692358']/lrm_resources/lrm_resource[@id='pgsqld']/lrm_rsc_op[@id='pgsqld_last_0']:  @operation_key=pgsqld_demote_0, @operation=demote, @transition-key=12:29:0:99751e86-20f1-4ee7-83bf-6428b030ebc9, @transition-magic=-1:193;12:29:0:99751e86-20f1-4ee7-83bf-6428b030ebc9, @call-id=-1, @rc-code=193, @op-status=-1, @last-run=1522239350, @last-rc-change=1522239350, @exec-time=0, @queue-ti
33  34  14:15:50 slesHAtest1     cib:    info: cib_process_request:   Completed cib_modify operation for section status: OK (rc=0, origin=slesHAtest2/crmd/66, version=0.88.3)
34  35  14:15:50 slesHAtest1     cib:    info: cib_file_write_with_digest:    Reading cluster configuration file /var/lib/pacemaker/cib/cib.R6jUCM (digest: /var/lib/pacemaker/cib/cib.CknL33)
35  26  14:15:50 slesHAtest1    lrmd:  notice: log_finished:  finished - rsc:pgsqld action:notify call_id:59 pid:24149 exit-code:0 exec-time:174ms queue-time:0ms
_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to