Re: [Pacemaker] One more globally-unique clone question
On 17 Jan 2015, at 1:25 am, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi all, Trying to reproduce problem with early stop of globally-unique clone instances during move to another node I found one more interesting problem. Due to the different order of resources in the CIB and extensive use of constraints between other resources (odd number of resources cluster-wide) two CLUSTERIP instances are always allocated to the same node in the new testing cluster. Ah, so this is why broker-vips:1 was moving. What would be the best/preferred way to make them run on different nodes by default? By default they will. I'm assuming its the constraints that are preventing this. Getting them to auto-rebalance is the harder problem I see following options: * Raise priority of globally-unique clone so its instances are always allocated first of all. * Use utilization attributes (with high values for nodes and low values for cluster resources). * Anything else? If I configure virtual IPs one-by-one (without clone), I can add a colocation constraint with negative score between them. I do not see a way to scale that setup well though (5-10 IPs). So, what would be the best option to achieve the same with globally-unique cloned resource? May be there should be some internal preference/colocation not to place them together (like default stickiness=1 for clones)? Or even allow special negative colocation constraint and the same resource in both 'what' and 'with' (colocation col1 -1: clone clone)? Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Unique clone instance is stopped too early on move
On 16 Jan 2015, at 3:59 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: 16.01.2015 07:44, Andrew Beekhof wrote: On 15 Jan 2015, at 3:11 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: 13.01.2015 11:32, Andrei Borzenkov wrote: On Tue, Jan 13, 2015 at 10:20 AM, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi Andrew, David, all. I found a little bit strange operation ordering during transition execution. Could you please look at the following partial configuration (crmsh syntax)? === ... clone cl-broker broker \ meta interleave=true target-role=Started clone cl-broker-vips broker-vips \ meta clone-node-max=2 globally-unique=true interleave=true resource-stickiness=0 target-role=Started clone cl-ctdb ctdb \ meta interleave=true target-role=Started colocation broker-vips-with-broker inf: cl-broker-vips cl-broker colocation broker-with-ctdb inf: cl-broker cl-ctdb order broker-after-ctdb inf: cl-ctdb cl-broker order broker-vips-after-broker 0: cl-broker cl-broker-vips ... === After I put one node to standby and then back to online, I see the following transition (relevant excerpt): === * Pseudo action: cl-broker-vips_stop_0 * Resource action: broker-vips:1 stop on c-pa-0 * Pseudo action: cl-broker-vips_stopped_0 * Pseudo action: cl-ctdb_start_0 * Resource action: ctdbstart on c-pa-1 * Pseudo action: cl-ctdb_running_0 * Pseudo action: cl-broker_start_0 * Resource action: ctdbmonitor=1 on c-pa-1 * Resource action: broker start on c-pa-1 * Pseudo action: cl-broker_running_0 * Pseudo action: cl-broker-vips_start_0 * Resource action: broker monitor=1 on c-pa-1 * Resource action: broker-vips:1 start on c-pa-1 * Pseudo action: cl-broker-vips_running_0 * Resource action: broker-vips:1 monitor=3 on c-pa-1 === What could be a reason to stop unique clone instance so early for move? Do not take it as definitive answer, but cl-broker-vips cannot run unless both other resources are started. So if you compute closure of all required transitions it looks rather logical. Having cl-broker-vips started while broker is still stopped would violate constraint. Problem is that broker-vips:1 is stopped on one (source) node unnecessarily early. It looks to be moving from c-pa-0 to c-pa-1 It might be unnecessarily early, but it is what you asked for... we have to unwind the resource stack before we can build it up. Yes, I understand that it is valid, but could its stop be delayed until cluster is in the state when all dependencies are satisfied to start it on another node (like migration?)? No, because we have to unwind the resource stack before we can build it up. Doing anything else would be one of those things that is trivial for a human to identify but rather complex for a computer. Better to look at why broker-vips:1 needed to be moved. Like: === * Pseudo action: cl-ctdb_start_0 * Resource action: ctdbstart on c-pa-1 * Pseudo action: cl-ctdb_running_0 * Pseudo action: cl-broker_start_0 * Resource action: ctdbmonitor=1 on c-pa-1 * Resource action: broker start on c-pa-1 * Pseudo action: cl-broker_running_0 * Pseudo action: cl-broker-vips_start_0 * Resource action: broker monitor=1 on c-pa-1 * Pseudo action: cl-broker-vips_stop_0 * Resource action: broker-vips:1 stop on c-pa-0 * Pseudo action: cl-broker-vips_stopped_0 * Resource action: broker-vips:1 start on c-pa-1 * Pseudo action: cl-broker-vips_running_0 * Resource action: broker-vips:1 monitor=3 on c-pa-1 === That would be the great optimization toward five nines... Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [corosync] CoroSync's UDPu transport for public IP addresses?
Dmitry, Great, it works! Thank you. It would be extremely helpful if this information will be included in a default corosync.conf as comments: - regarding allowed and even preferred absense of totem.interface in case of UDPu Yep - that quorum section must not be empty, and that the default quorum.provider could be corosync_votequorum (but not empty). This is not entirely true. quorum.provider cannot be empty string, or generally must be valid provider like corosync_votequorum. But unspecified quorum.provider works without any problem (as in example configuration file). Truth is, that Pacemaker must then be configured in a way that quorum is not required. Regards, Honza It would help to install and launch corosync instantly by novices. On Fri, Jan 16, 2015 at 7:31 PM, Jan Friesse jfrie...@redhat.com wrote: Dmitry Koterov napsal(a): such messages (for now). But, anyway, DNS names in ringX_addr seem not working, and no relevant messages are in default logs. Maybe add some validations for ringX_addr? I'm having resolvable DNS names: root@node1:/etc/corosync# ping -c1 -W100 node1 | grep from 64 bytes from node1 (127.0.1.1): icmp_seq=1 ttl=64 time=0.039 ms This is problem. Resolving node1 to localhost (127.0.0.1) is simply wrong. Names you want to use in corosync.conf should resolve to interface address. I believe other nodes has similar setting (so node2 resolved on node2 is again 127.0.0.1) Wow! What a shame! How could I miss it... So you're absolutely right, thanks: that was the cause, an entry in /etc/hosts. On some machines I removed it manually, but on others - didn't. Now I do it automatically by sed -i -r /^.*[[:space:]]$host([[:space:]]|\$)/d /etc/hosts in the initialization script. I apologize for the mess. So now I have only one place in corosync.conf where I need to specify a plain IP address for UDPu: totem.interface.bindnetaddr. If I specify 0.0.0.0 there, I'm having a message Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!' in the logs (BTW it does not say that I mistaked in bindnetaddr). Is there a way to completely untie from IP addresses? You can just remove whole interface section completely. Corosync will find correct address from nodelist. Regards, Honza Please try to fix this problem first and let's see if this will solve issue you are hitting. Regards, Honza root@node1:/etc/corosync# ping -c1 -W100 node2 | grep from 64 bytes from node2 (188.166.54.190): icmp_seq=1 ttl=55 time=88.3 ms root@node1:/etc/corosync# ping -c1 -W100 node3 | grep from 64 bytes from node3 (128.199.116.218): icmp_seq=1 ttl=51 time=252 ms With corosync.conf below, nothing works: ... nodelist { node { ring0_addr: node1 } node { ring0_addr: node2 } node { ring0_addr: node3 } } ... Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync Cluster Engine ('2.3.3'): started and ready to provide service. Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync built-in features: dbus testagents rdma watchdog augeas pie relro bindnow Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing transport (UDP/IP Unicast). Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1 Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] The network interface [a.b.c.d] is now up. Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded: corosync configuration map access [0] Jan 14 10:47:44 node1 corosync[15062]: [QB] server name: cmap Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded: corosync configuration service [1] Jan 14 10:47:44 node1 corosync[15062]: [QB] server name: cfg Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2] Jan 14 10:47:44 node1 corosync[15062]: [QB] server name: cpg Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded: corosync profile loading service [4] Jan 14 10:47:44 node1 corosync[15062]: [WD] No Watchdog, try modprobe a watchdog Jan 14 10:47:44 node1 corosync[15062]: [WD] no resources configured. Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded: corosync watchdog service [7] Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Using quorum provider corosync_votequorum Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Quorum provider: corosync_votequorum failed to initialize. Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!' Jan 14 10:47:44 node1 corosync[15062]: [MAIN ] Corosync Cluster Engine exiting with status 20 at service.c:356. But with IP addresses specified in ringX_addr, everything works: ... nodelist { node {
Re: [Pacemaker] One more globally-unique clone question
20.01.2015 02:47, Andrew Beekhof wrote: On 17 Jan 2015, at 1:25 am, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi all, Trying to reproduce problem with early stop of globally-unique clone instances during move to another node I found one more interesting problem. Due to the different order of resources in the CIB and extensive use of constraints between other resources (odd number of resources cluster-wide) two CLUSTERIP instances are always allocated to the same node in the new testing cluster. Ah, so this is why broker-vips:1 was moving. That are two different 2-node clusters with different order of resources. In the first one broker-vips go after even number of resources, and one instance wants to return to a mother-node after it is brought back online, thus broker-vips:1 is moving. In the second one, broker-vips go after odd number of resources (actually three more resources are allocated to one node due to constraints) and both boker-vips go to another node. What would be the best/preferred way to make them run on different nodes by default? By default they will. I'm assuming its the constraints that are preventing this. I only see that they are allocated similar to any other resources. Getting them to auto-rebalance is the harder problem I see. Should it be possible to solve it without priority or utilization use? I see following options: * Raise priority of globally-unique clone so its instances are always allocated first of all. * Use utilization attributes (with high values for nodes and low values for cluster resources). * Anything else? If I configure virtual IPs one-by-one (without clone), I can add a colocation constraint with negative score between them. I do not see a way to scale that setup well though (5-10 IPs). So, what would be the best option to achieve the same with globally-unique cloned resource? May be there should be some internal preference/colocation not to place them together (like default stickiness=1 for clones)? Or even allow special negative colocation constraint and the same resource in both 'what' and 'with' (colocation col1 -1: clone clone)? Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Corosync fails to start when NIC is absent
One more thing to clarify. You said rebind can be avoided - what does it mean? Thank you, Kostya On Wed, Jan 14, 2015 at 1:31 PM, Kostiantyn Ponomarenko konstantin.ponomare...@gmail.com wrote: Thank you. Now I am aware of it. Thank you, Kostya On Wed, Jan 14, 2015 at 12:59 PM, Jan Friesse jfrie...@redhat.com wrote: Kostiantyn, Honza, Thank you for helping me. So, there is no defined behavior in case one of the interfaces is not in the system? You are right. There is no defined behavior. Regards, Honza Thank you, Kostya On Tue, Jan 13, 2015 at 12:01 PM, Jan Friesse jfrie...@redhat.com wrote: Kostiantyn, According to the https://access.redhat.com/solutions/638843 , the interface, that is defined in the corosync.conf, must be present in the system (see at the bottom of the article, section ROOT CAUSE). To confirm that I made a couple of tests. Here is a part of the corosync.conf file (in a free-write form) (also attached the origin config file): === rrp_mode: passive ring0_addr is defined in corosync.conf ring1_addr is defined in corosync.conf === --- Two-node cluster --- Test #1: -- IP for ring0 is not defines in the system: -- Start Corosync simultaneously on both nodes. Corosync fails to start. From the logs: Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] parse error in config: No interfaces defined Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1343. Result: Corosync and Pacemaker are not running. Test #2: -- IP for ring1 is not defines in the system: -- Start Corosync simultaneously on both nodes. Corosync starts. Start Pacemaker simultaneously on both nodes. Pacemaker fails to start. From the logs, the last writes from the corosync: Jan 8 16:31:29 daemon.err27 corosync[3728]: [TOTEM ] Marking ringid 0 interface 169.254.1.3 FAULTY Jan 8 16:31:30 daemon.notice29 corosync[3728]: [TOTEM ] Automatically recovered ring 0 Result: Corosync and Pacemaker are not running. Test #3: rrp_mode: active leads to the same result, except Corosync and Pacemaker init scripts return status running. But still vim /var/log/cluster/corosync.log shows a lot of errors like: Jan 08 16:30:47 [4067] A6-402-1 cib: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Result: Corosync and Pacemaker show their statuses as running, but crm_mon cannot connect to the cluster database. And half of the Pacemaker's services are not running (including Cluster Information Base (CIB)). --- For a single node mode --- IP for ring0 is not defines in the system: Corosync fails to start. IP for ring1 is not defines in the system: Corosync and Pacemaker are started. It is possible that configuration will be applied successfully (50%), and it is possible that the cluster is not running any resources, and it is possible that the node cannot be put in a standby mode (shows: communication error), and it is possible that the cluster is running all resources, but applied configuration is not guaranteed to be fully loaded (some rules can be missed). --- Conclusions: --- It is possible that in some rare cases (see comments to the bug) the cluster will work, but in that case its working state is unstable and the cluster can stop working every moment. So, is it correct? Does my assumptions make any sense? I didn't any other explanation in the network ... . Corosync needs all interfaces during start and runtime. This doesn't mean they must be connected (this would make corosync unusable for physical NIC/Switch or cable failure), but they must be up and have correct ip. When this is not the case, corosync rebinds to localhost and weird things happens. Removal of this rebinding is long time TODO, but there are still more important bugs (especially because rebind can be avoided). Regards, Honza Thank you, Kostya On Fri, Jan 9, 2015 at 11:10 AM, Kostiantyn Ponomarenko konstantin.ponomare...@gmail.com wrote: Hi guys, Corosync fails to start if there is no such network interface configured in the system. Even with rrp_mode: passive the problem is the same when at least one network interface is not configured in the system. Is this the expected behavior? I thought that when you use redundant rings, it is enough to have at least
Re: [Pacemaker] no nodes on both hosts
Hi, i recompiled all packages, and now, it works... But my /var/run/crm is also in the new installation empty, is this ok? Gesendet:Montag, 19. Januar 2015 um 11:07 Uhr Von:Thomas Manninger dbgtmas...@gmx.at An:pacemaker@oss.clusterlabs.org Betreff:Re: [Pacemaker] no nodes on both hosts Hi, now, i see, in /var/run/crm there are no socket- files, the directory is empty. How can i debug the problem? Gesendet:Montag, 19. Januar 2015 um 10:18 Uhr Von:Thomas Manninger dbgtmas...@gmx.at An:pacemaker@oss.clusterlabs.org Betreff:Re: [Pacemaker] no nodes on both hosts Hi, in reinstalled in the same vm debian, and used the debian pacemaker corosync packages, everything works fine. Then, i recomilied the newest pacemaker corosync packages with the same problem.. Jan 19 10:13:31 [24271] pacemaker2 cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=pacemaker2/crmd/3, version=0.0.0) This line means, that the node is added in the cib.xml? But there are no nodes: root@pacemaker2:/var/lib/pacemaker/cib# cat cib.xml cib crm_feature_set=3.0.9 validate-with=pacemaker-2.0 epoch=0 num_updates=0 admin_epoch=0 cib-last-written=Mon Jan 19 10:13:30 2015 configuration crm_config/ nodes/ resources/ constraints/ /configuration /cib I also changed the permission of the cib folder to 777... Someone can help me?? Thanks! Gesendet:Freitag, 16. Januar 2015 um 16:51 Uhr Von:Thomas Manninger dbgtmas...@gmx.at An:pacemaker@oss.clusterlabs.org Betreff:[Pacemaker] no nodes on both hosts Hi, i use debian 7. At first, i use the standard packages of debian, and pacemaker works perfect. Now, i compiled my own packages, because i need pacemaker_remote. Since i use my compiled version, pacemaker see no nodes! corosync2 lists both hosts: root@pacemaker1:/var/lib/pacemaker/cib# corosync-cmapctl grep members runtime.totem.pg.mrp.srp.members.181614346.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.181614346.ip (str) = r(0) ip(10.211.55.10) runtime.totem.pg.mrp.srp.members.181614346.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.181614346.status (str) = joined runtime.totem.pg.mrp.srp.members.181614347.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.181614347.ip (str) = r(0) ip(10.211.55.11) runtime.totem.pg.mrp.srp.members.181614347.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.181614347.status (str) = joined root@pacemaker1:/var/lib/pacemaker/cib# crm_mon -1 Last updated: Fri Jan 16 16:49:10 2015 Last change: Fri Jan 16 16:05:15 2015 Current DC: NONE 0 Nodes configured 0 Resources configured uname -n returns pacemaker1 / pacemaker2. Logfile is attached. corosync.conf: totem { version: 2 token: 5000 # crypto_cipher and crypto_hash: Used for mutual node authentication. # If you choose to enable this, then do remember to create a shared # secret with corosync-keygen. # enabling crypto_cipher, requires also enabling of crypto_hash. crypto_cipher: none crypto_hash: none # interface: define at least one interface to communicate # over. If you define more than one interface stanza, you must # also set rrp_mode. interface { # Rings must be consecutively numbered, starting at 0. ringnumber: 0 # This is normally the *network* address of the # interface to bind to. This ensures that you can use # identical instances of this configuration file # across all your cluster nodes, without having to # modify this option. bindnetaddr: 10.211.55.10 # However, if you have multiple physical network # interfaces configured for the same subnet, then the # network address alone is not sufficient to identify # the interface Corosync should bind to. In that case, # configure the *host* address of the interface # instead: # bindnetaddr: 192.168.1.1 # When selecting a multicast address, consider RFC # 2365 (which, among other things, specifies that # 239.255.x.x addresses are left to the discretion of # the network administrator). Do not reuse multicast # addresses across multiple Corosync clusters sharing # the same network. mcastaddr: 239.255.1.1 # Corosync uses the port you specify here for UDP # messaging, and also the immediately preceding # port. Thus if you set this to 5405, Corosync sends # messages over UDP ports 5405 and 5404. mcastport: 5405 # Time-to-live for cluster communication packets. The # number of hops (routers) that this ring will allow # itself to pass. Note that multicast routing must be # specifically enabled on most network routers. ttl: 1 } } logging { # Log the source file and line where messages are being # generated. When in doubt, leave off. Potentially useful for # debugging. fileline: off # Log to standard error. When in doubt, set to no. Useful when # running in the foreground (when invoking corosync -f) to_stderr: no # Log to a log file. When set to no, the logfile option # must not be set.
Re: [Pacemaker] no nodes on both hosts
Hi, now, i see, in /var/run/crm there are no socket- files, the directory is empty. How can i debug the problem? Gesendet:Montag, 19. Januar 2015 um 10:18 Uhr Von:Thomas Manninger dbgtmas...@gmx.at An:pacemaker@oss.clusterlabs.org Betreff:Re: [Pacemaker] no nodes on both hosts Hi, in reinstalled in the same vm debian, and used the debian pacemaker corosync packages, everything works fine. Then, i recomilied the newest pacemaker corosync packages with the same problem.. Jan 19 10:13:31 [24271] pacemaker2 cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=pacemaker2/crmd/3, version=0.0.0) This line means, that the node is added in the cib.xml? But there are no nodes: root@pacemaker2:/var/lib/pacemaker/cib# cat cib.xml cib crm_feature_set=3.0.9 validate-with=pacemaker-2.0 epoch=0 num_updates=0 admin_epoch=0 cib-last-written=Mon Jan 19 10:13:30 2015 configuration crm_config/ nodes/ resources/ constraints/ /configuration /cib I also changed the permission of the cib folder to 777... Someone can help me?? Thanks! Gesendet:Freitag, 16. Januar 2015 um 16:51 Uhr Von:Thomas Manninger dbgtmas...@gmx.at An:pacemaker@oss.clusterlabs.org Betreff:[Pacemaker] no nodes on both hosts Hi, i use debian 7. At first, i use the standard packages of debian, and pacemaker works perfect. Now, i compiled my own packages, because i need pacemaker_remote. Since i use my compiled version, pacemaker see no nodes! corosync2 lists both hosts: root@pacemaker1:/var/lib/pacemaker/cib# corosync-cmapctl grep members runtime.totem.pg.mrp.srp.members.181614346.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.181614346.ip (str) = r(0) ip(10.211.55.10) runtime.totem.pg.mrp.srp.members.181614346.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.181614346.status (str) = joined runtime.totem.pg.mrp.srp.members.181614347.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.181614347.ip (str) = r(0) ip(10.211.55.11) runtime.totem.pg.mrp.srp.members.181614347.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.181614347.status (str) = joined root@pacemaker1:/var/lib/pacemaker/cib# crm_mon -1 Last updated: Fri Jan 16 16:49:10 2015 Last change: Fri Jan 16 16:05:15 2015 Current DC: NONE 0 Nodes configured 0 Resources configured uname -n returns pacemaker1 / pacemaker2. Logfile is attached. corosync.conf: totem { version: 2 token: 5000 # crypto_cipher and crypto_hash: Used for mutual node authentication. # If you choose to enable this, then do remember to create a shared # secret with corosync-keygen. # enabling crypto_cipher, requires also enabling of crypto_hash. crypto_cipher: none crypto_hash: none # interface: define at least one interface to communicate # over. If you define more than one interface stanza, you must # also set rrp_mode. interface { # Rings must be consecutively numbered, starting at 0. ringnumber: 0 # This is normally the *network* address of the # interface to bind to. This ensures that you can use # identical instances of this configuration file # across all your cluster nodes, without having to # modify this option. bindnetaddr: 10.211.55.10 # However, if you have multiple physical network # interfaces configured for the same subnet, then the # network address alone is not sufficient to identify # the interface Corosync should bind to. In that case, # configure the *host* address of the interface # instead: # bindnetaddr: 192.168.1.1 # When selecting a multicast address, consider RFC # 2365 (which, among other things, specifies that # 239.255.x.x addresses are left to the discretion of # the network administrator). Do not reuse multicast # addresses across multiple Corosync clusters sharing # the same network. mcastaddr: 239.255.1.1 # Corosync uses the port you specify here for UDP # messaging, and also the immediately preceding # port. Thus if you set this to 5405, Corosync sends # messages over UDP ports 5405 and 5404. mcastport: 5405 # Time-to-live for cluster communication packets. The # number of hops (routers) that this ring will allow # itself to pass. Note that multicast routing must be # specifically enabled on most network routers. ttl: 1 } } logging { # Log the source file and line where messages are being # generated. When in doubt, leave off. Potentially useful for # debugging. fileline: off # Log to standard error. When in doubt, set to no. Useful when # running in the foreground (when invoking corosync -f) to_stderr: no # Log to a log file. When set to no, the logfile option # must not be set. to_logfile: yes logfile: /var/log/cluster/corosync.log # Log to the system log daemon. When in doubt, set to yes. to_syslog: no # Log debug messages (very verbose). When in doubt, leave off. debug: on # Log messages with time stamps. When in doubt, set to on # (unless you are only logging to
Re: [Pacemaker] no nodes on both hosts
Hi, in reinstalled in the same vm debian, and used the debian pacemaker corosync packages, everything works fine. Then, i recomilied the newest pacemaker corosync packages with the same problem.. Jan 19 10:13:31 [24271] pacemaker2 cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=pacemaker2/crmd/3, version=0.0.0) This line means, that the node is added in the cib.xml? But there are no nodes: root@pacemaker2:/var/lib/pacemaker/cib# cat cib.xml cib crm_feature_set=3.0.9 validate-with=pacemaker-2.0 epoch=0 num_updates=0 admin_epoch=0 cib-last-written=Mon Jan 19 10:13:30 2015 configuration crm_config/ nodes/ resources/ constraints/ /configuration /cib I also changed the permission of the cib folder to 777... Someone can help me?? Thanks! Gesendet:Freitag, 16. Januar 2015 um 16:51 Uhr Von:Thomas Manninger dbgtmas...@gmx.at An:pacemaker@oss.clusterlabs.org Betreff:[Pacemaker] no nodes on both hosts Hi, i use debian 7. At first, i use the standard packages of debian, and pacemaker works perfect. Now, i compiled my own packages, because i need pacemaker_remote. Since i use my compiled version, pacemaker see no nodes! corosync2 lists both hosts: root@pacemaker1:/var/lib/pacemaker/cib# corosync-cmapctl grep members runtime.totem.pg.mrp.srp.members.181614346.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.181614346.ip (str) = r(0) ip(10.211.55.10) runtime.totem.pg.mrp.srp.members.181614346.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.181614346.status (str) = joined runtime.totem.pg.mrp.srp.members.181614347.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.181614347.ip (str) = r(0) ip(10.211.55.11) runtime.totem.pg.mrp.srp.members.181614347.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.181614347.status (str) = joined root@pacemaker1:/var/lib/pacemaker/cib# crm_mon -1 Last updated: Fri Jan 16 16:49:10 2015 Last change: Fri Jan 16 16:05:15 2015 Current DC: NONE 0 Nodes configured 0 Resources configured uname -n returns pacemaker1 / pacemaker2. Logfile is attached. corosync.conf: totem { version: 2 token: 5000 # crypto_cipher and crypto_hash: Used for mutual node authentication. # If you choose to enable this, then do remember to create a shared # secret with corosync-keygen. # enabling crypto_cipher, requires also enabling of crypto_hash. crypto_cipher: none crypto_hash: none # interface: define at least one interface to communicate # over. If you define more than one interface stanza, you must # also set rrp_mode. interface { # Rings must be consecutively numbered, starting at 0. ringnumber: 0 # This is normally the *network* address of the # interface to bind to. This ensures that you can use # identical instances of this configuration file # across all your cluster nodes, without having to # modify this option. bindnetaddr: 10.211.55.10 # However, if you have multiple physical network # interfaces configured for the same subnet, then the # network address alone is not sufficient to identify # the interface Corosync should bind to. In that case, # configure the *host* address of the interface # instead: # bindnetaddr: 192.168.1.1 # When selecting a multicast address, consider RFC # 2365 (which, among other things, specifies that # 239.255.x.x addresses are left to the discretion of # the network administrator). Do not reuse multicast # addresses across multiple Corosync clusters sharing # the same network. mcastaddr: 239.255.1.1 # Corosync uses the port you specify here for UDP # messaging, and also the immediately preceding # port. Thus if you set this to 5405, Corosync sends # messages over UDP ports 5405 and 5404. mcastport: 5405 # Time-to-live for cluster communication packets. The # number of hops (routers) that this ring will allow # itself to pass. Note that multicast routing must be # specifically enabled on most network routers. ttl: 1 } } logging { # Log the source file and line where messages are being # generated. When in doubt, leave off. Potentially useful for # debugging. fileline: off # Log to standard error. When in doubt, set to no. Useful when # running in the foreground (when invoking corosync -f) to_stderr: no # Log to a log file. When set to no, the logfile option # must not be set. to_logfile: yes logfile: /var/log/cluster/corosync.log # Log to the system log daemon. When in doubt, set to yes. to_syslog: no # Log debug messages (very verbose). When in doubt, leave off. debug: on # Log messages with time stamps. When in doubt, set to on # (unless you are only logging to syslog, where double # timestamps can be annoying). timestamp: on logger_subsys { subsys: QUORUM debug: off } } quorum { # Enable and configure quorum subsystem (default: off) # see also corosync.conf.5 and votequorum.5 #provider: corosync_votequorum } Thanks!