Re: [Pacemaker] different behavior cibadmin -Ql with cman and corosync2

Andrew Beekhof Thu, 12 Sep 2013 00:05:21 -0700

On 11/09/2013, at 2:57 PM, Andrey Groshev <gre...@yandex.ru> wrote:

> Hello Christine, Andrew and all.
> 
> I'm sorry - a little was unwell, so did not answer.
> What we end this stream of messages?
> Who will change? corosync or pacemaker?


For now make sure you specify a nodeid and name.
Longer term, Chrissie is looking at making the combined data set available in a 
different namespace for pacemaker to use.

> 
> 
> 05.09.2013, 15:49, "Christine Caulfield" <ccaul...@redhat.com>:
>> On 05/09/13 11:33, Andrew Beekhof wrote:
>> 
>>>  On 05/09/2013, at 6:37 PM, Christine Caulfield <ccaul...@redhat.com> wrote:
>>>>  On 03/09/13 22:03, Andrew Beekhof wrote:
>>>>>  On 03/09/2013, at 11:49 PM, Christine Caulfield <ccaul...@redhat.com> 
>>>>> wrote:
>>>>>>  On 03/09/13 05:20, Andrew Beekhof wrote:
>>>>>>>  On 02/09/2013, at 5:27 PM, Andrey Groshev <gre...@yandex.ru> wrote:
>>>>>>>>  30.08.2013, 07:18, "Andrew Beekhof" <and...@beekhof.net>:
>>>>>>>>>  On 29/08/2013, at 7:31 PM, Andrey Groshev <gre...@yandex.ru> wrote:
>>>>>>>>>>    29.08.2013, 12:25, "Andrey Groshev" <gre...@yandex.ru>:
>>>>>>>>>>>    29.08.2013, 02:55, "Andrew Beekhof" <and...@beekhof.net>:
>>>>>>>>>>>>     On 28/08/2013, at 5:38 PM, Andrey Groshev <gre...@yandex.ru> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>      28.08.2013, 04:06, "Andrew Beekhof" <and...@beekhof.net>:
>>>>>>>>>>>>>>      On 27/08/2013, at 1:13 PM, Andrey Groshev 
>>>>>>>>>>>>>> <gre...@yandex.ru> wrote:
>>>>>>>>>>>>>>>       27.08.2013, 05:39, "Andrew Beekhof" <and...@beekhof.net>:
>>>>>>>>>>>>>>>>       On 26/08/2013, at 3:09 PM, Andrey Groshev 
>>>>>>>>>>>>>>>> <gre...@yandex.ru> wrote:
>>>>>>>>>>>>>>>>>        26.08.2013, 03:34, "Andrew Beekhof" 
>>>>>>>>>>>>>>>>> <and...@beekhof.net>:
>>>>>>>>>>>>>>>>>>        On 23/08/2013, at 9:39 PM, Andrey Groshev 
>>>>>>>>>>>>>>>>>> <gre...@yandex.ru> wrote:
>>>>>>>>>>>>>>>>>>>         Hello,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>         Today I try remake my test cluster from cman to 
>>>>>>>>>>>>>>>>>>> corosync2.
>>>>>>>>>>>>>>>>>>>         I drew attention to the following:
>>>>>>>>>>>>>>>>>>>         If I reset cluster with cman through cibadmin 
>>>>>>>>>>>>>>>>>>> --erase --force
>>>>>>>>>>>>>>>>>>>         In cib is still there exist names of nodes.
>>>>>>>>>>>>>>>>>>        Yes, the cluster puts back entries for all the nodes 
>>>>>>>>>>>>>>>>>> it know about automagically.
>>>>>>>>>>>>>>>>>>>         cibadmin -Ql
>>>>>>>>>>>>>>>>>>>         .....
>>>>>>>>>>>>>>>>>>>            <nodes>
>>>>>>>>>>>>>>>>>>>              <node id="dev-cluster2-node2.unix.tensor.ru" 
>>>>>>>>>>>>>>>>>>> uname="dev-cluster2-node2"/>
>>>>>>>>>>>>>>>>>>>              <node id="dev-cluster2-node4.unix.tensor.ru" 
>>>>>>>>>>>>>>>>>>> uname="dev-cluster2-node4"/>
>>>>>>>>>>>>>>>>>>>              <node id="dev-cluster2-node3.unix.tensor.ru" 
>>>>>>>>>>>>>>>>>>> uname="dev-cluster2-node3"/>
>>>>>>>>>>>>>>>>>>>            </nodes>
>>>>>>>>>>>>>>>>>>>         ....
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>         Even if cman and pacemaker running only one node.
>>>>>>>>>>>>>>>>>>        I'm assuming all three are configured in cluster.conf?
>>>>>>>>>>>>>>>>>        Yes, there exist list nodes.
>>>>>>>>>>>>>>>>>>>         And if I do too on cluster with corosync2
>>>>>>>>>>>>>>>>>>>         I see only names of nodes which run corosync and 
>>>>>>>>>>>>>>>>>>> pacemaker.
>>>>>>>>>>>>>>>>>>        Since you're not included your config, I can only 
>>>>>>>>>>>>>>>>>> guess that your corosync.conf does not have a nodelist.
>>>>>>>>>>>>>>>>>>        If it did, you should get the same behaviour.
>>>>>>>>>>>>>>>>>        I try and expected_node and nodelist.
>>>>>>>>>>>>>>>>       And it didn't work? What version of pacemaker?
>>>>>>>>>>>>>>>       It does not work as I expected.
>>>>>>>>>>>>>>      Thats because you've used IP addresses in the node list.
>>>>>>>>>>>>>>      ie.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>      node {
>>>>>>>>>>>>>>        ring0_addr: 10.76.157.17
>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>      try including the node name as well, eg.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>      node {
>>>>>>>>>>>>>>        name: dev-cluster2-node2
>>>>>>>>>>>>>>        ring0_addr: 10.76.157.17
>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>      The same thing.
>>>>>>>>>>>>     I don't know what to say.  I tested it here yesterday and it 
>>>>>>>>>>>> worked as expected.
>>>>>>>>>>>    I found that the reason that You and I have different results - 
>>>>>>>>>>> I did not have reverse DNS zone for these nodes.
>>>>>>>>>>>    I know what it should be, but (PACEMAKER + CMAN) worked without 
>>>>>>>>>>> a reverse area!
>>>>>>>>>>    Hasty. Deleted all. Reinstalled. Configured. Not working again. 
>>>>>>>>>> Damn!
>>>>>>>>>  It would have surprised me... pacemaker 1.1.11 doesn't do any dns 
>>>>>>>>> lookups - reverse or otherwise.
>>>>>>>>>  Can you set
>>>>>>>>> 
>>>>>>>>>    PCMK_trace_files=corosync.c
>>>>>>>>> 
>>>>>>>>>  in your environment and retest?
>>>>>>>>> 
>>>>>>>>>  On RHEL6 that means putting the following in /etc/sysconfig/pacemaker
>>>>>>>>>     export PCMK_trace_files=corosync.c
>>>>>>>>> 
>>>>>>>>>  It should produce additional logging[1] that will help diagnose the 
>>>>>>>>> issue.
>>>>>>>>> 
>>>>>>>>>  [1] http://blog.clusterlabs.org/blog/2013/pacemaker-logging/
>>>>>>>>  Hello, Andrew.
>>>>>>>> 
>>>>>>>>  You are a little misunderstood me.
>>>>>>>  No, I understood you fine.
>>>>>>>>  I wrote that I rushed to judgment.
>>>>>>>>  After I did the reverse DNS zone, the cluster behaved correctly.
>>>>>>>>  BUT after I took apart the cluster dropped configs and restarted on 
>>>>>>>> the new cluster,
>>>>>>>>  cluster again don't showed all the nodes in the nodes (only node with 
>>>>>>>> running pacemaker).
>>>>>>>> 
>>>>>>>>  A small portion of the log. Full log
>>>>>>>>  In which (I thought) there is something interesting.
>>>>>>>> 
>>>>>>>>  Aug 30 12:31:11 [9986] dev-cluster2-node4        cib: (  
>>>>>>>> corosync.c:423   )   trace: check_message_sanity:      Verfied message 
>>>>>>>> 4: (dest=<all>:cib, from=dev-cluster2-node4:cib.9986, compressed=0, 
>>>>>>>> size=1551, total=2143)
>>>>>>>>  Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  
>>>>>>>> corosync.c:96    )   trace: corosync_node_name:        Checking 
>>>>>>>> 172793107 vs 0 from nodelist.node.0.nodeid
>>>>>>>>  Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (      
>>>>>>>> ipcc.c:378   )   debug: qb_ipcc_disconnect:        qb_ipcc_disconnect()
>>>>>>>>  Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: 
>>>>>>>> (ringbuffer.c:294   )   debug: qb_rb_close:       Closing ringbuffer: 
>>>>>>>> /dev/shm/qb-cmap-request-9616-9989-27-header
>>>>>>>>  Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: 
>>>>>>>> (ringbuffer.c:294   )   debug: qb_rb_close:       Closing ringbuffer: 
>>>>>>>> /dev/shm/qb-cmap-response-9616-9989-27-header
>>>>>>>>  Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: 
>>>>>>>> (ringbuffer.c:294   )   debug: qb_rb_close:       Closing ringbuffer: 
>>>>>>>> /dev/shm/qb-cmap-event-9616-9989-27-header
>>>>>>>>  Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  
>>>>>>>> corosync.c:134   )  notice: corosync_node_name:        Unable to get 
>>>>>>>> node name for nodeid 172793107
>>>>>>>  I wonder if you need to be including the nodeid too. ie.
>>>>>>> 
>>>>>>>  node {
>>>>>>>    name: dev-cluster2-node2
>>>>>>>    ring0_addr: 10.76.157.17
>>>>>>>    nodeid: 2
>>>>>>>  }
>>>>>>> 
>>>>>>>  I _thought_ that was implicit.
>>>>>>>  Chrissie: is "nodelist.node.%d.nodeid" always available for corosync2 
>>>>>>> or only if explicitly defined in the config?
>>>>>>  You do need to specify a nodeid if you don't want corosync to imply it 
>>>>>> from the IP address (or you're using IPv6). corosync won't imply a 
>>>>>> nodeif from the order of the nodes in corosync.conf - that's not 
>>>>>> reliable enough.
>>>>>  Right, but is that implied nodeid available as "nodelist.node.%d.nodeid"?
>>>>>  Andrey's results suggest "no" and I would claim this is not 
>>>>> expected/good :)
>>>>  If you want to get the nodeid of the node you are on
>>>  No, we're trying to use a known nodeid to look up the other information in 
>>> the node list - such as 'ring0_addr' or 'name'.
>> 
>> votequorum_get_info()
>> 
>> Chrissie
>> 
>>>>  there is both a corosync API call for it - totem_nodeid_get() - or you 
>>>> can get it from votequorum via cmap - runtime.votequorum.this_node_id
>>>> 
>>>>  The nodelist.* section of cmap is really meant to reflect what is in 
>>>> corosync.conf and I don't really want to be writing into it. I know there 
>>>> is already nodelist.our_node_pos, but I'm not a fan of that either :P
>>>> 
>>>>  Chrissie
>>>>>>  Also bear in mind that 0 is not a valid node number :-)
>>>>>> 
>>>>>>  Chrissie
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] different behavior cibadmin -Ql with cman and corosync2

Reply via email to