On Thu, Jun 21, 2018 at 10:47 AM Jason Gauthier <jagauth...@gmail.com> wrote: > > On Thu, Jun 21, 2018 at 9:49 AM Jan Pokorný <jpoko...@redhat.com> wrote: > > > > On 21/06/18 07:05 -0400, Jason Gauthier wrote: > > > On Thu, Jun 21, 2018 at 5:11 AM Christine Caulfield <ccaul...@redhat.com> > > > wrote: > > >> On 19/06/18 18:47, Jason Gauthier wrote: > > >>> Attached! > > >> > > >> That's very odd. I can see communication with the server and corosync in > > >> there (do it's doing something) but no logging at all. When I start > > >> qdevice on my systems it logs loads of messages even if it doesn't > > >> manage to contact the server. Do you have any logging entries in > > >> corosync.conf that might be stopping it? > > > > > > I haven't checked the corosync logs for any entries before, but I just > > > did. There isn't anything logged. > > > > What about syslog entries (may boil down to /var/log/messages, > > journald log, or whatever sink is configured)? > > I took a look, since both you and Chrissie mentioned that. > > There aren't any new entries added to any of the /var/log files. > > # corosync-qdevice -f -d > # date > Thu Jun 21 10:36:06 EDT 2018 > > # ls -lt|head > total 152072 > -rw-r----- 1 root adm 68018 Jun 21 10:34 auth.log > -rw-rw-r-- 1 root utmp 18704352 Jun 21 10:34 lastlog > -rw-rw-r-- 1 root utmp 107136 Jun 21 10:34 wtmp > -rw-r----- 1 root adm 248444 Jun 21 10:34 daemon.log > -rw-r----- 1 root adm 160899 Jun 21 10:34 syslog > -rw-r----- 1 root adm 1119856 Jun 21 09:46 kern.log > > I did look through daemon, messages, and syslog just to be sure. > > > >> Where did the binary come from? did you build it yourself or is it from > > >> a package? I wonder if it's got corrupted or is a bad version. Possibly > > >> linked against a 'dodgy' libqb - there have been some things going on > > >> there that could cause logging to go missing in some circumstances. > > >> > > >> Honza (the qdevice expert) is away at the moment, so I'm guessing a bit > > >> here anyway! > > > > > > Hmm. Interesting. I installed the debian package. When it didn't > > > work, I grabbed the source from github. They both act the same way, > > > but if there is an underlying library issue then that will continue to > > > be a problem. > > > > > > It doesn't say much: > > > /usr/lib/x86_64-linux-gnu/libqb.so.0.18.1 > > > > You are likely using libqb v1.0.1. > > Correct. I didn't even think to look at the output of dpkg -l for the > package version. > Debian 9 also packages binutils-2.28 > > > Ability to figure out the proper package version is one of the most > > basic skills to provide useful diagnostics about the issues with > > distro-provided packages. > > > > With Debian, the proper incantation seems to be > > > > dpkg -s libqb-dev | grep -i version > > > > or > > > > apt list libqb-dev > > > > (or substitute libqb0 for libqb-dev). > > > > As Chrissie mentioned, there is some fishiness possible if you happen > > to use ld linker from binutils 2.29+ for the building with this old > > libqb in the mix, so if the issues persist and logging seems to be > > missing, try recompiling with the downgraded binutils package below > > said breakage point. > > Since the system already has a lower numbered binutils (2.28) I wonder > if I should attempt to build a newer version of the libqb library. > > As Chrissie mentioned, I will open a bug with Debian in the Interim. > But I don 't believe I will see resolution to that any time soon. :)
I was finally able to look at this problem again, and found that qnetd is giving me some messaging, but I don't know what to do with it. Jun 29 16:34:35 debug New client connected Jun 29 16:34:35 debug cluster name = zeta Jun 29 16:34:35 debug tls started = 1 Jun 29 16:34:35 debug tls peer certificate verified = 1 Jun 29 16:34:35 debug node_id = 1084772368 Jun 29 16:34:35 debug pointer = 0x563afd609d70 Jun 29 16:34:35 debug addr_str = ::ffff:192.168.80.16:38010 Jun 29 16:34:35 debug ring id = (40a85010.89ec) Jun 29 16:34:35 debug cluster dump: Jun 29 16:34:35 debug client = ::ffff:192.168.80.16:38010, node_id = 1084772368 Jun 29 16:34:35 debug Client ::ffff:192.168.80.16:38010 (cluster zeta, node_id 1084772368) sent initial node list. Jun 29 16:34:35 debug msg seq num 4 Jun 29 16:34:35 debug node list: Jun 29 16:34:35 error ffsplit: Received empty config node list for client ::ffff:192.168.80.16:38010 Jun 29 16:34:35 error Algorithm returned error code. Sending error reply. Jun 29 16:34:35 debug Client ::ffff:192.168.80.16:38010 (cluster zeta, node_id 1084772368) sent membership node list. Jun 29 16:34:35 debug msg seq num 5 Jun 29 16:34:35 debug ring id = (40a85010.89ec) Jun 29 16:34:35 debug node list: Jun 29 16:34:35 debug node_id = 1084772368, data_center_id = 0, node_state = not set Jun 29 16:34:35 debug node_id = 1084772369, data_center_id = 0, node_state = not set Jun 29 16:34:35 debug Algorithm result vote is Ask later Jun 29 16:34:35 debug Client ::ffff:192.168.80.16:38010 (cluster zeta, node_id 1084772368) sent quorum node list. Jun 29 16:34:35 debug msg seq num 6 Jun 29 16:34:35 debug quorate = 1 Jun 29 16:34:35 debug node list: Jun 29 16:34:35 debug node_id = 1084772368, data_center_id = 0, node_state = member Jun 29 16:34:35 debug node_id = 1084772369, data_center_id = 0, node_state = member It looks like "config node list" is empty, but the other lists are not. I'm not sure where it's getting that node list from. For fun, I added nodelist { node { alpha: 192.168.80.16 } node { beta: 192.168.80.17 } } } to corosync.conf, and restarted both nodes. But that didn't help. _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org