On 11/09/2013, at 2:57 PM, Andrey Groshev <gre...@yandex.ru> wrote: > Hello Christine, Andrew and all. > > I'm sorry - a little was unwell, so did not answer. > What we end this stream of messages? > Who will change? corosync or pacemaker?
For now make sure you specify a nodeid and name. Longer term, Chrissie is looking at making the combined data set available in a different namespace for pacemaker to use. > > > 05.09.2013, 15:49, "Christine Caulfield" <ccaul...@redhat.com>: >> On 05/09/13 11:33, Andrew Beekhof wrote: >> >>> On 05/09/2013, at 6:37 PM, Christine Caulfield <ccaul...@redhat.com> wrote: >>>> On 03/09/13 22:03, Andrew Beekhof wrote: >>>>> On 03/09/2013, at 11:49 PM, Christine Caulfield <ccaul...@redhat.com> >>>>> wrote: >>>>>> On 03/09/13 05:20, Andrew Beekhof wrote: >>>>>>> On 02/09/2013, at 5:27 PM, Andrey Groshev <gre...@yandex.ru> wrote: >>>>>>>> 30.08.2013, 07:18, "Andrew Beekhof" <and...@beekhof.net>: >>>>>>>>> On 29/08/2013, at 7:31 PM, Andrey Groshev <gre...@yandex.ru> wrote: >>>>>>>>>> 29.08.2013, 12:25, "Andrey Groshev" <gre...@yandex.ru>: >>>>>>>>>>> 29.08.2013, 02:55, "Andrew Beekhof" <and...@beekhof.net>: >>>>>>>>>>>> On 28/08/2013, at 5:38 PM, Andrey Groshev <gre...@yandex.ru> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> 28.08.2013, 04:06, "Andrew Beekhof" <and...@beekhof.net>: >>>>>>>>>>>>>> On 27/08/2013, at 1:13 PM, Andrey Groshev >>>>>>>>>>>>>> <gre...@yandex.ru> wrote: >>>>>>>>>>>>>>> 27.08.2013, 05:39, "Andrew Beekhof" <and...@beekhof.net>: >>>>>>>>>>>>>>>> On 26/08/2013, at 3:09 PM, Andrey Groshev >>>>>>>>>>>>>>>> <gre...@yandex.ru> wrote: >>>>>>>>>>>>>>>>> 26.08.2013, 03:34, "Andrew Beekhof" >>>>>>>>>>>>>>>>> <and...@beekhof.net>: >>>>>>>>>>>>>>>>>> On 23/08/2013, at 9:39 PM, Andrey Groshev >>>>>>>>>>>>>>>>>> <gre...@yandex.ru> wrote: >>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Today I try remake my test cluster from cman to >>>>>>>>>>>>>>>>>>> corosync2. >>>>>>>>>>>>>>>>>>> I drew attention to the following: >>>>>>>>>>>>>>>>>>> If I reset cluster with cman through cibadmin >>>>>>>>>>>>>>>>>>> --erase --force >>>>>>>>>>>>>>>>>>> In cib is still there exist names of nodes. >>>>>>>>>>>>>>>>>> Yes, the cluster puts back entries for all the nodes >>>>>>>>>>>>>>>>>> it know about automagically. >>>>>>>>>>>>>>>>>>> cibadmin -Ql >>>>>>>>>>>>>>>>>>> ..... >>>>>>>>>>>>>>>>>>> <nodes> >>>>>>>>>>>>>>>>>>> <node id="dev-cluster2-node2.unix.tensor.ru" >>>>>>>>>>>>>>>>>>> uname="dev-cluster2-node2"/> >>>>>>>>>>>>>>>>>>> <node id="dev-cluster2-node4.unix.tensor.ru" >>>>>>>>>>>>>>>>>>> uname="dev-cluster2-node4"/> >>>>>>>>>>>>>>>>>>> <node id="dev-cluster2-node3.unix.tensor.ru" >>>>>>>>>>>>>>>>>>> uname="dev-cluster2-node3"/> >>>>>>>>>>>>>>>>>>> </nodes> >>>>>>>>>>>>>>>>>>> .... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Even if cman and pacemaker running only one node. >>>>>>>>>>>>>>>>>> I'm assuming all three are configured in cluster.conf? >>>>>>>>>>>>>>>>> Yes, there exist list nodes. >>>>>>>>>>>>>>>>>>> And if I do too on cluster with corosync2 >>>>>>>>>>>>>>>>>>> I see only names of nodes which run corosync and >>>>>>>>>>>>>>>>>>> pacemaker. >>>>>>>>>>>>>>>>>> Since you're not included your config, I can only >>>>>>>>>>>>>>>>>> guess that your corosync.conf does not have a nodelist. >>>>>>>>>>>>>>>>>> If it did, you should get the same behaviour. >>>>>>>>>>>>>>>>> I try and expected_node and nodelist. >>>>>>>>>>>>>>>> And it didn't work? What version of pacemaker? >>>>>>>>>>>>>>> It does not work as I expected. >>>>>>>>>>>>>> Thats because you've used IP addresses in the node list. >>>>>>>>>>>>>> ie. >>>>>>>>>>>>>> >>>>>>>>>>>>>> node { >>>>>>>>>>>>>> ring0_addr: 10.76.157.17 >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> try including the node name as well, eg. >>>>>>>>>>>>>> >>>>>>>>>>>>>> node { >>>>>>>>>>>>>> name: dev-cluster2-node2 >>>>>>>>>>>>>> ring0_addr: 10.76.157.17 >>>>>>>>>>>>>> } >>>>>>>>>>>>> The same thing. >>>>>>>>>>>> I don't know what to say. I tested it here yesterday and it >>>>>>>>>>>> worked as expected. >>>>>>>>>>> I found that the reason that You and I have different results - >>>>>>>>>>> I did not have reverse DNS zone for these nodes. >>>>>>>>>>> I know what it should be, but (PACEMAKER + CMAN) worked without >>>>>>>>>>> a reverse area! >>>>>>>>>> Hasty. Deleted all. Reinstalled. Configured. Not working again. >>>>>>>>>> Damn! >>>>>>>>> It would have surprised me... pacemaker 1.1.11 doesn't do any dns >>>>>>>>> lookups - reverse or otherwise. >>>>>>>>> Can you set >>>>>>>>> >>>>>>>>> PCMK_trace_files=corosync.c >>>>>>>>> >>>>>>>>> in your environment and retest? >>>>>>>>> >>>>>>>>> On RHEL6 that means putting the following in /etc/sysconfig/pacemaker >>>>>>>>> export PCMK_trace_files=corosync.c >>>>>>>>> >>>>>>>>> It should produce additional logging[1] that will help diagnose the >>>>>>>>> issue. >>>>>>>>> >>>>>>>>> [1] http://blog.clusterlabs.org/blog/2013/pacemaker-logging/ >>>>>>>> Hello, Andrew. >>>>>>>> >>>>>>>> You are a little misunderstood me. >>>>>>> No, I understood you fine. >>>>>>>> I wrote that I rushed to judgment. >>>>>>>> After I did the reverse DNS zone, the cluster behaved correctly. >>>>>>>> BUT after I took apart the cluster dropped configs and restarted on >>>>>>>> the new cluster, >>>>>>>> cluster again don't showed all the nodes in the nodes (only node with >>>>>>>> running pacemaker). >>>>>>>> >>>>>>>> A small portion of the log. Full log >>>>>>>> In which (I thought) there is something interesting. >>>>>>>> >>>>>>>> Aug 30 12:31:11 [9986] dev-cluster2-node4 cib: ( >>>>>>>> corosync.c:423 ) trace: check_message_sanity: Verfied message >>>>>>>> 4: (dest=<all>:cib, from=dev-cluster2-node4:cib.9986, compressed=0, >>>>>>>> size=1551, total=2143) >>>>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: ( >>>>>>>> corosync.c:96 ) trace: corosync_node_name: Checking >>>>>>>> 172793107 vs 0 from nodelist.node.0.nodeid >>>>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: ( >>>>>>>> ipcc.c:378 ) debug: qb_ipcc_disconnect: qb_ipcc_disconnect() >>>>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: >>>>>>>> (ringbuffer.c:294 ) debug: qb_rb_close: Closing ringbuffer: >>>>>>>> /dev/shm/qb-cmap-request-9616-9989-27-header >>>>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: >>>>>>>> (ringbuffer.c:294 ) debug: qb_rb_close: Closing ringbuffer: >>>>>>>> /dev/shm/qb-cmap-response-9616-9989-27-header >>>>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: >>>>>>>> (ringbuffer.c:294 ) debug: qb_rb_close: Closing ringbuffer: >>>>>>>> /dev/shm/qb-cmap-event-9616-9989-27-header >>>>>>>> Aug 30 12:31:11 [9989] dev-cluster2-node4 attrd: ( >>>>>>>> corosync.c:134 ) notice: corosync_node_name: Unable to get >>>>>>>> node name for nodeid 172793107 >>>>>>> I wonder if you need to be including the nodeid too. ie. >>>>>>> >>>>>>> node { >>>>>>> name: dev-cluster2-node2 >>>>>>> ring0_addr: 10.76.157.17 >>>>>>> nodeid: 2 >>>>>>> } >>>>>>> >>>>>>> I _thought_ that was implicit. >>>>>>> Chrissie: is "nodelist.node.%d.nodeid" always available for corosync2 >>>>>>> or only if explicitly defined in the config? >>>>>> You do need to specify a nodeid if you don't want corosync to imply it >>>>>> from the IP address (or you're using IPv6). corosync won't imply a >>>>>> nodeif from the order of the nodes in corosync.conf - that's not >>>>>> reliable enough. >>>>> Right, but is that implied nodeid available as "nodelist.node.%d.nodeid"? >>>>> Andrey's results suggest "no" and I would claim this is not >>>>> expected/good :) >>>> If you want to get the nodeid of the node you are on >>> No, we're trying to use a known nodeid to look up the other information in >>> the node list - such as 'ring0_addr' or 'name'. >> >> votequorum_get_info() >> >> Chrissie >> >>>> there is both a corosync API call for it - totem_nodeid_get() - or you >>>> can get it from votequorum via cmap - runtime.votequorum.this_node_id >>>> >>>> The nodelist.* section of cmap is really meant to reflect what is in >>>> corosync.conf and I don't really want to be writing into it. I know there >>>> is already nodelist.our_node_pos, but I'm not a fan of that either :P >>>> >>>> Chrissie >>>>>> Also bear in mind that 0 is not a valid node number :-) >>>>>> >>>>>> Chrissie >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org