Hi Vlad,
No I haven't. 2.34 wasn't available when I had the problem. FYI I'm
still using glib-2.30.3 and I'll probably stay there until 3.34.0 is
stable or 2.32.4-r2 comes out :)
Pat.
On 05/11/2012 7:57 AM, Vladimir Elisseev wrote:
Patrick,
Gentoo devs suggested trying >=dev-libs/glib-2.34. Have you tried this
already?
Regards,
Vlad.
On Sun, 2012-11-04 at 08:45 -0800, Patrick Irvine wrote:
Hey Guys,
Just for the record, I just noticed this thread and it sounded familiar.
I checked my gentoo systems and I had to mask glib-2.32.4-r1 and use
glib-2.30.3 in order to get corosync/pacemaker to work. I had the same
problem. Nodes couldn't talk to each other. Sorry I didn't notice this
thread earlier, as I might have been able to help.
Pat.
On 04/11/2012 3:15 AM, Vladimir Elisseev wrote:
Thanks for the explanation. I saw coredumps in the directories you
mentioned already. The "suspicious -r1" includes two patches over
vanilla version of glib:
https://bugzilla.gnome.org/show_bug.cgi?id=679306
http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/dev-libs/glib/files/glib-2.32.4-CVE-2012-3524.patch?view=markup
For the moment I simply masked this particular glib version. Hopefully
I'll be able to find time to do a complete debug as you described.
Regards,
Vlad.
On Sun, 2012-11-04 at 13:54 +0300, Vladislav Bogdanov wrote:
03.11.2012 18:22, Vladimir Elisseev wrote:
Vladislav,
Thanks for the hint! Upgrading glig from 2.30.3 to 2.32.4 triggers this
behavior of corosync. Do you know where I can find more info regarding
this problem?
That is not corosync but pacemaker, which heavily uses glib internally.
And glib is the only package in your list which may affect pacemaker.
I would say that is a regression in that specific glib version or build.
Library behavior changed without bumping major so-number.
You'd better talk to your distribution maintainers. And -r1 looks
suspicious in glib version you installed. Don't you know what does it mean?
One more note, cib exits with signal 6 (SIGABRT), which usually means
you hit some assert in code. That usually results in memory dump. Look
at /var/lib/heartbeat/cores or /var/lib/pacemaker/cores if you have
relevant core files for that. If not, then you need to enable coredumps.
Then install debuginfo packages for pacemaker and glib (that is very
distribution specific, so I cannot help with that). After that you can
analyze relevant core files with 'gdb <full_path_to_cib_binary>
<core_dump_file>'
Just run 'bt full' and that should be enough to find what exactly code
path caused SIGABRT.
Vladislav
Vlad.
On Sat, 2012-11-03 at 16:22 +0300, Vladislav Bogdanov wrote:
03.11.2012 15:26, Vladimir Elisseev wrote:
I've been able to reproduce the problem. Herewith I've attached
crm_report tarballs from both nodes. Although I don't know what
particular package triggers this problem, but below is the list of what
has been updated. Hopefully this helps.
I bet that is glib.
Vladislav
Regards,
Vlad.
Sat Nov 3 12:15:40 2012 <<< sys-apps/busybox-1.20.2
Sat Nov 3 12:15:42 2012 >>> sys-apps/busybox-1.20.2
Sat Nov 3 12:15:50 2012 <<< sys-fs/dosfstools-3.0.9
Sat Nov 3 12:15:52 2012 >>> sys-fs/dosfstools-3.0.12
Sat Nov 3 12:16:00 2012 <<< dev-lang/nasm-2.10.01
Sat Nov 3 12:16:02 2012 >>> dev-lang/nasm-2.10.05
Sat Nov 3 12:16:11 2012 <<< dev-libs/libgamin-0.1.10-r2
Sat Nov 3 12:16:13 2012 >>> dev-libs/libgamin-0.1.10-r3
Sat Nov 3 12:16:40 2012 <<< media-fonts/droid-113-r1
Sat Nov 3 12:16:46 2012 >>> media-fonts/droid-113-r2
Sat Nov 3 12:16:54 2012 <<< media-libs/libpng-1.5.10
Sat Nov 3 12:16:56 2012 >>> media-libs/libpng-1.5.13-r1
Sat Nov 3 12:17:04 2012 <<< app-arch/unzip-6.0-r1
Sat Nov 3 12:17:05 2012 >>> app-arch/unzip-6.0-r3
Sat Nov 3 12:17:12 2012 <<< app-arch/rpm2targz-9.0.0.4g
Sat Nov 3 12:17:14 2012 >>> app-arch/rpm2targz-9.0.0.5g
Sat Nov 3 12:17:22 2012 <<< app-arch/pbzip2-1.1.5
Sat Nov 3 12:17:24 2012 >>> app-arch/pbzip2-1.1.8
Sat Nov 3 12:17:34 2012 <<< app-arch/zip-3.0
Sat Nov 3 12:17:35 2012 >>> app-arch/zip-3.0-r1
Sat Nov 3 12:17:43 2012 <<< sys-process/htop-1.0.1
Sat Nov 3 12:17:45 2012 >>> sys-process/htop-1.0.1-r1
Sat Nov 3 12:17:55 2012 <<< media-libs/tiff-4.0.2
Sat Nov 3 12:17:57 2012 >>> media-libs/tiff-4.0.2-r1
Sat Nov 3 12:18:04 2012 <<< net-ftp/tftp-hpa-5.1
Sat Nov 3 12:18:06 2012 >>> net-ftp/tftp-hpa-5.2
Sat Nov 3 12:18:18 2012 <<< media-video/ffmpeg-0.10.3
Sat Nov 3 12:18:20 2012 >>> media-video/ffmpeg-0.10.3
Sat Nov 3 12:18:35 2012 <<< sys-devel/gettext-0.18.1.1-r1
Sat Nov 3 12:18:37 2012 >>> sys-devel/gettext-0.18.1.1-r3
Sat Nov 3 12:18:44 2012 <<< app-admin/logrotate-3.8.1
Sat Nov 3 12:18:46 2012 >>> app-admin/logrotate-3.8.2
Sat Nov 3 12:18:54 2012 <<< media-libs/libwebp-0.1.3
Sat Nov 3 12:18:55 2012 >>> media-libs/libwebp-0.2.0
Sat Nov 3 12:19:03 2012 <<< dev-perl/Convert-ASN1-0.220.0
Sat Nov 3 12:19:05 2012 >>> dev-perl/Convert-ASN1-0.260.0
Sat Nov 3 12:19:13 2012 <<< dev-perl/net-server-0.97
Sat Nov 3 12:19:15 2012 >>> dev-perl/net-server-2.6.0
Sat Nov 3 12:19:24 2012 <<< dev-perl/Config-IniFiles-2.710.0
Sat Nov 3 12:19:26 2012 >>> dev-perl/Config-IniFiles-2.760.0
Sat Nov 3 12:19:33 2012 <<< dev-perl/HTTP-Date-6.0.0
Sat Nov 3 12:19:35 2012 >>> dev-perl/HTTP-Date-6.20.0
Sat Nov 3 12:19:44 2012 <<< sys-boot/syslinux-4.06_pre11
Sat Nov 3 12:19:46 2012 >>> sys-boot/syslinux-4.06
Sat Nov 3 12:20:05 2012 <<< dev-libs/glib-2.30.3
Sat Nov 3 12:20:08 2012 >>> dev-libs/glib-2.32.4-r1
Sat Nov 3 12:20:16 2012 <<< dev-util/pkgconfig-0.27
Sat Nov 3 12:20:18 2012 >>> dev-util/pkgconfig-0.27.1
Sat Nov 3 12:20:28 2012 <<< net-analyzer/jnettop-0.13.0-r1
Sat Nov 3 12:20:29 2012 >>> net-analyzer/jnettop-0.13.0-r1
Sat Nov 3 12:20:41 2012 <<< x11-libs/pango-1.29.4
Sat Nov 3 12:20:43 2012 >>> x11-libs/pango-1.30.1
Sat Nov 3 12:20:53 2012 <<< net-analyzer/rrdtool-1.4.5-r1
Sat Nov 3 12:20:56 2012 >>> net-analyzer/rrdtool-1.4.7-r1
Sat Nov 3 12:21:03 2012 <<< app-shells/gentoo-bashcomp-20101217
Sat Nov 3 12:21:05 2012 >>> app-shells/gentoo-bashcomp-20101217-r1
Sat Nov 3 12:21:12 2012 <<< dev-perl/MIME-tools-5.502.0
Sat Nov 3 12:21:14 2012 >>> dev-perl/MIME-tools-5.503.0
Sat Nov 3 12:21:24 2012 <<< dev-perl/Convert-TNEF-0.170.0
Sat Nov 3 12:21:26 2012 >>> dev-perl/Convert-TNEF-0.180.0
Sat Nov 3 12:21:35 2012 <<< net-misc/curl-7.25.0-r1
Sat Nov 3 12:21:36 2012 >>> net-misc/curl-7.26.0
Sat Nov 3 12:21:51 2012 <<< mail-mta/postfix-2.9.3
Sat Nov 3 12:21:53 2012 >>> mail-mta/postfix-2.9.4
Sat Nov 3 12:22:01 2012 <<< dev-perl/Net-SSLeay-1.360.0
Sat Nov 3 12:22:03 2012 >>> dev-perl/Net-SSLeay-1.480.0-r1
Sat Nov 3 12:22:12 2012 <<< sys-auth/nss_ldap-264-r1
Sat Nov 3 12:22:14 2012 >>> sys-auth/nss_ldap-265-r1
Sat Nov 3 12:22:25 2012 <<< net-mail/fetchmail-6.3.21
Sat Nov 3 12:22:27 2012 >>> net-mail/fetchmail-6.3.22
Sat Nov 3 12:22:37 2012 <<< net-misc/dhcp-4.2.4_p1
Sat Nov 3 12:22:39 2012 >>> net-misc/dhcp-4.2.4_p2
Sat Nov 3 12:22:48 2012 <<< net-analyzer/tcpdump-3.9.8-r1
Sat Nov 3 12:22:50 2012 >>> net-analyzer/tcpdump-4.3.0
Sat Nov 3 12:23:07 2012 <<< dev-util/cmake-2.8.8-r3
Sat Nov 3 12:23:09 2012 >>> dev-util/cmake-2.8.9
Sat Nov 3 12:23:21 2012 <<< dev-vcs/subversion-1.6.17-r7
Sat Nov 3 12:23:24 2012 >>> dev-vcs/subversion-1.6.17-r7
Sat Nov 3 12:27:56 2012 <<< media-gfx/imagemagick-6.7.8.7
Sat Nov 3 12:27:58 2012 >>> media-gfx/imagemagick-6.7.8.7
On Thu, 2012-11-01 at 07:08 +0100, Vladimir Elisseev wrote:
Yes, hb_report is there, thanks!
On Thu, 2012-11-01 at 11:40 +1100, Andrew Beekhof wrote:
On Tue, Oct 30, 2012 at 4:35 PM, Vladimir Elisseev <vo...@vovan.nl> wrote:
Thanks for trying to help! Currently I can't provide crm_report from the
failed node, as I've decided to restore the complete node from backup.
The versions I use are corosync-1.3.0 and pacemaker-1.0.10. Actually the
problem occurred after updating quiet a few system packages, but all the
cluster related software was untouched. I've found exactly the same
issue described in the mailing list earlier:
http://www.gossamer-threads.com/lists/linuxha/pacemaker/77881?do=post_view_threaded#77881
At least symptoms are exactly the same as well as pasted log files. I've
tried enable debug logging as well and saw that crm tries to connect to
cib sockets (/var/run/crm_*) too early (IMO) and fails because cib
wasn't started yet.
I'm planning to repeat update of these system again, but I'll do this
more carefully in order to understand which particular package leads to
this behavior. BTW, how can I create crm_report? I can't find this
binary anywhere on the system.
Its included in subsequent 1.0.x releases.
You should have hb_report available though.
Let me know what kind of input you'll
need if I'll be able to reproduce this problem.
Regards,
Vlad.
On Tue, 2012-10-30 at 16:00 +1100, Andrew Beekhof wrote:
On Sun, Oct 28, 2012 at 9:05 PM, Vladimir Elisseev <vo...@vovan.nl> wrote:
Hello,
I'm having problem that after reboot one cluster node can't join cluster
anymore. Form the log file I can't understand what actually is going on.
I only can see, that cib and crm both are respawned frequently. I'd
appreciate any help. Below is relevant part of the log file:
I appreciate that you're trying to keep it brief, but problems often
originate much earlier than people suspect.
Can you instead attach a crm_report tarball, that will have everything
(from both nodes) that we need to be able to help.
What version is this btw?
Oct 28 10:52:22 srv2 cib: [10646]: info: cib_server_process_diff: Requesting
re-sync from peer
Oct 28 10:52:22 srv2 cib: [10646]: WARN: cib_diff_notify: Local-only Change
(client:crmd, call: 4770): -1.-1.-1 (Application of an update diff failed,
requesting a full refresh)
Oct 28 10:52:22 srv2 cib: [10653]: info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.qJTUAV (digest:
/var/lib/heartbeat/crm/cib.XwOKXQ)
Oct 28 10:52:22 srv2 cib: [10646]: WARN: cib_server_process_diff: Not applying
diff 0.1298.5 -> 0.1299.1 (sync in progress)
Oct 28 10:52:22 srv2 cib: [10646]: info: cib_replace_notify: Local-only
Replace: -1.-1.-1 from srv1
Oct 28 10:52:22 corosync [pcmk]: ] info: pcmk_ipc_exit: Client cib
(conn=0x1837340, async-conn=0x1837340) left
Oct 28 10:52:22 corosync [pcmk]: ] ERROR: pcmk_wait_dispatch: Child process
cib terminated with signal 6 (pid=10646, core=true)
Oct 28 10:52:22 corosync [pcmk]: ] notice: pcmk_wait_dispatch: Respawning
failed child process: cib
Oct 28 10:52:22 corosync [pcmk]: ] info: spawn_child: Forked child 10656 for
process cib
Oct 28 10:52:22 srv2 cib: [10656]: info: Invoked: /usr/lib64/heartbeat/cib
Regards,
Vlad.
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org