On Oct 14, 2008, at 1:59 PM, Rainer Traut wrote:

answering myself, this was obviously the wrong part of the log.
here we go.

Status is now, one node has the older release:

working node n01:

# rpm -qa|grep heartbeat
heartbeat-common-2.99.0-3.1
heartbeat-resources-2.99.0-3.1
heartbeat-2.99.0-3.1
heartbeat-ldirectord-2.99.0-3.1
pacemaker-heartbeat-0.6.6-18.1
[EMAIL PROTECTED] ~]# rpm -qa|grep pacemaker
pacemaker-heartbeat-0.6.6-18.1
pacemaker-pygui-1.4-7.2

not working incl. logs node n02:

# rpm -qa|grep heartbeat
heartbeat-common-2.99.2-2.1
heartbeat-ldirectord-2.99.2-2.1
heartbeat-2.99.2-2.1
heartbeat-resources-2.99.2-2.1
libheartbeat2-2.99.2-2.1
[EMAIL PROTECTED] ~]# rpm -qa|grep pacema
pacemaker-1.0.0-1.2
libpacemaker3-1.0.0-1.2
pacemaker-pygui-1.4-8.1

Oct 14 13:49:23 n02asp7 attrd: [7901]: info: main: Starting up....
Oct 14 13:49:23 n02asp7 attrd: [7901]: ERROR: main: HA Signon failed
Oct 14 13:49:23 n02asp7 attrd: [7901]: ERROR: main: Aborting startup
Oct 14 13:49:23 n02asp7 heartbeat: [7885]: WARN: Managed /usr/lib64/ heartbeat/attrd process 7901 exited with return code 100. Oct 14 13:49:23 n02asp7 cib: [7898]: info: G_main_add_SignalHandler: Added signal handler for signal 15 Oct 14 13:49:23 n02asp7 cib: [7898]: info: G_main_add_TriggerHandler: Added signal manual handler Oct 14 13:49:23 n02asp7 cib: [7898]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Oct 14 13:49:23 n02asp7 cib: [7898]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml
(digest: /var/lib/heartbeat/crm/cib.xml.sig)
Oct 14 13:49:23 n02asp7 cib: [7898]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml Oct 14 13:49:23 n02asp7 cib: [7898]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup... Oct 14 13:49:23 n02asp7 cib: [7898]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest: /var/lib/heartbeat/crm/cib.xml.sig.last)
Oct 14 13:49:23 n02asp7 ccm: [7897]: info: Hostname: n02asp7
Oct 14 13:49:23 n02asp7 stonithd: [7900]: info: G_main_add_SignalHandler: Added signal handler for signal 10 Oct 14 13:49:23 n02asp7 stonithd: [7900]: info: G_main_add_SignalHandler: Added signal handler for signal 12 Oct 14 13:49:23 n02asp7 cib: [7898]: ERROR: validate_cib_digest: Digest comparision failed: expected dc90f2e743db61688a8cd6610c845ed2 (/var/lib/heartbeat/crm/cib.xml.sig.last), calculated 19e33575c865951da4f9cbf417207136 Oct 14 13:49:23 n02asp7 cib: [7898]: ERROR: retrieveCib: Checksum of /var/lib/heartbeat/crm/cib.xml.last failed! Configuration contents ignored! Oct 14 13:49:23 n02asp7 cib: [7898]: ERROR: retrieveCib: Usually this is caused by manual changes, please refer to http://linux-ha.org/v2/faq/cib_changes_detected Oct 14 13:49:23 n02asp7 cib: [7898]: WARN: retrieveCib: Continuing but /var/lib/heartbeat/crm/cib.xml.last will NOT used. Oct 14 13:49:23 n02asp7 cib: [7898]: WARN: readCibXmlFile: Continuing with an empty configuration. Oct 14 13:49:23 n02asp7 cib: [7898]: info: startCib: CIB Initialization completed successfully Oct 14 13:49:23 n02asp7 cib: [7898]: CRIT: cib_init: Cannot sign in to the cluster... terminating

Thats not good.
Can you turn debug on and re-post the complete log?


Oct 14 13:49:23 n02asp7 heartbeat: [7885]: WARN: Managed /usr/lib64/ heartbeat/cib process 7898 exited with return code 100. Oct 14 13:49:23 n02asp7 heartbeat: [7885]: EMERG: Rebooting system. Reason: /usr/lib64/heartbeat/cib

change "crm yes" to "crm respawn" and heartbeat wont reboot the node every time a process exits.


Oct 14 13:49:23 n02asp7 lrmd: [7899]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Oct 14 13:49:23 n02asp7 lrmd: [7899]: info: G_main_add_SignalHandler: Added signal handler for signal 10



Rainer Traut schrieb:
ok, after doing so und updating to latest from repo this node keeps rebooting itself.
only error I see is:
pengine: [12564]: WARN: text2task: Unsupported action: status
...
Oct 14 12:01:46 n02asp7 pengine: [12564]: notice: group_print: Resource Group: group_2 Oct 14 12:01:46 n02asp7 pengine: [12564]: notice: native_print: r- httpd (lsb:httpd): Started n01asp7 Oct 14 12:01:46 n02asp7 pengine: [12564]: notice: native_print: r- named (lsb:named): Started n01asp7 Oct 14 12:01:46 n02asp7 pengine: [12564]: notice: clone_print: Clone Set: ntpd-clone Oct 14 12:01:46 n02asp7 pengine: [12564]: notice: native_print: c- ntpd:0 (lsb:ntpd): Started n01asp7 Oct 14 12:01:46 n02asp7 pengine: [12564]: notice: native_print: c- ntpd:1 (lsb:ntpd): Stopped Oct 14 12:01:46 n02asp7 pengine: [12564]: WARN: text2task: Unsupported action: status
Oct 14 12:01:46 n02asp7 last message repeated 2 times
Oct 14 12:01:46 n02asp7 pengine: [12564]: WARN: native_color: Resource pingd-child:0 cannot run anywhere Oct 14 12:01:46 n02asp7 pengine: [12564]: WARN: native_color: Resource drbd0:0 cannot run anywhere Oct 14 12:01:46 n02asp7 pengine: [12564]: info: master_color: Promoting drbd0:1 (Master n01asp7) Oct 14 12:01:46 n02asp7 pengine: [12564]: info: master_color: ms- drbd0: Promoted 1 instances of a possible 1 to master
Oct 14 12:01:46 n02asp7 last message repeated 2 times
Oct 14 12:01:46 n02asp7 pengine: [12564]: WARN: native_color: Resource c-ntpd:1 cannot run anywhere Oct 14 12:01:46 n02asp7 pengine: [12564]: notice: NoRoleChange: Leave resource pingd-child:1 (Started n01asp7) Oct 14 12:01:46 n02asp7 pengine: [12564]: notice: NoRoleChange: Leave resource drbd0:1 (Master n01asp7) Oct 14 12:01:46 n02asp7 pengine: [12564]: notice: NoRoleChange: Leave resource drbd0:1 (Master n01asp7) Oct 14 12:01:46 n02asp7 pengine: [12564]: notice: NoRoleChange: Leave resource r-srv (Started n01asp7) Oct 14 12:01:46 n02asp7 pengine: [12564]: notice: NoRoleChange: Leave resource r-oeIP (Started n01asp7) Oct 14 12:01:46 n02asp7 pengine: [12564]: notice: NoRoleChange: Leave resource r-email (Started n01asp7) Oct 14 12:01:46 n02asp7 pengine: [12564]: notice: NoRoleChange: Leave resource r-httpd (Started n01asp7) Oct 14 12:01:46 n02asp7 pengine: [12564]: WARN: text2task: Unsupported action: status Oct 14 12:01:46 n02asp7 pengine: [12564]: notice: NoRoleChange: Leave resource r-named (Started n01asp7) Oct 14 12:01:46 n02asp7 pengine: [12564]: WARN: text2task: Unsupported action: status Oct 14 12:01:46 n02asp7 pengine: [12564]: notice: NoRoleChange: Leave resource c-ntpd:0 (Started n01asp7) Oct 14 12:01:46 n02asp7 pengine: [12564]: WARN: text2task: Unsupported action: status Oct 14 12:01:46 n02asp7 pengine: [12564]: info: stage6: Scheduling Node n02asp7 for shutdown Oct 14 12:01:46 n02asp7 pengine: [12564]: WARN: text2task: Unsupported action: status
Oct 14 12:01:46 n02asp7 last message repeated 2 times
Oct 14 12:01:46 n02asp7 crmd: [7195]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
Oct 14 12:01:46 n02asp7 tengine: [12563]: info: process_te_message: Processing graph derived from /var/lib/heartbeat/pengine/pe- warn-1146.bz2 Oct 14 12:01:46 n02asp7 tengine: [12563]: info: unpack_graph: Unpacked transition 47: 1 actions in 1 synapses Oct 14 12:01:46 n02asp7 tengine: [12563]: info: te_crm_command: Executing crm-event (70): do_shutdown on n02asp7 Oct 14 12:01:46 n02asp7 crmd: [7195]: info: handle_request: Shutting ourselves down (DC)
Andrew Beekhof schrieb:
I'm in the middle of a heap of package changes at the moment which probably isn't helping. For now, try removing pacemaker-heartbeat (in retrospect, the way I implemented single-stack packages wasn't the most ideal) and just installing pacemaker.

On Oct 14, 2008, at 11:27 AM, Rainer Traut wrote:

Hi,

OS: Centos5 x86_64

When running yum update:

--> Running transaction check
---> Package heartbeat-common.x86_64 0:2.99.2-2.1 set to be updated
--> Processing Dependency: libplumbgpl.so.2()(64bit) for package: heartbeat-common --> Processing Dependency: libapphb.so.2()(64bit) for package: heartbeat-common --> Processing Dependency: libplumb.so.2()(64bit) for package: heartbeat-common --> Processing Dependency: libpils.so.2()(64bit) for package: heartbeat-common --> Processing Dependency: libapphb.so.0()(64bit) for package: pacemaker-heartbeat --> Processing Dependency: libpils.so.1()(64bit) for package: pacemaker-heartbeat --> Processing Dependency: libplumb.so.1()(64bit) for package: pacemaker-heartbeat ---> Package heartbeat-resources.x86_64 0:2.99.2-2.1 set to be updated ---> Package heartbeat-ldirectord.x86_64 0:2.99.2-2.1 set to be updated
---> Package pacemaker-pygui.x86_64 0:1.4-8.1 set to be updated
---> Package heartbeat.x86_64 0:2.99.2-2.1 set to be updated
--> Running transaction check
--> Processing Dependency: libapphb.so.0()(64bit) for package: pacemaker-heartbeat --> Processing Dependency: libplumb.so.1()(64bit) for package: pacemaker-heartbeat
---> Package libheartbeat2.x86_64 0:2.99.2-2.1 set to be updated
---> Package heartbeat-pils.x86_64 0:2.1.3-3.el5.centos set to be updated
--> Processing Conflict: libheartbeat2 conflicts heartbeat-pils
--> Finished Dependency Resolution
Error: Missing Dependency: libplumb.so.1()(64bit) is needed by package pacemaker-heartbeat
Error: libheartbeat2 conflicts with heartbeat-pils
Error: Missing Dependency: libapphb.so.0()(64bit) is needed by package pacemaker-heartbeat

_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker


_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker


_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker

Reply via email to