Re: [Pacemaker] errors in corosync.log
On Sat, Jan 16, 2010 at 9:20 PM, Shravan Mishra shravan.mis...@gmail.com wrote: Hi Guys, I'm running the following version of pacemaker and corosync corosync=1.1.1-1-2 pacemaker=1.0.9-2-1 Every thing had been running fine for quite some time now but then I started seeing following errors in the corosync logs, = Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid digest... ignoring. Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid digest... ignoring. Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid digest... ignoring. Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data I can perform all the crm shell commands and what not but it's troubling that the above is happening. My crm_mon output looks good. I also checked the authkey and did md5sum on both it's same. Then I stopped corosync and regenerated the authkey with corosync-keygen and copied it to the the other machine but I still get the above message in the corosync log. Are you sure there's not a third node somewhere broadcasting on that mcast and port combination? Is there anything other authkey that I should look into ? corosync.conf # Please read the corosync.conf.5 manual page compatibility: whitetank totem { version: 2 token: 3000 token_retransmits_before_loss_const: 10 join: 60 consensus: 1500 vsftype: none max_messages: 20 clear_node_high_bit: yes secauth: on threads: 0 rrp_mode: passive interface { ringnumber: 0 bindnetaddr: 192.168.2.0 #mcastaddr: 226.94.1.1 broadcast: yes mcastport: 5405 } interface { ringnumber: 1 bindnetaddr: 172.20.20.0 #mcastaddr: 226.94.1.1 broadcast: yes mcastport: 5405 } } logging { fileline: off to_stderr: yes to_logfile: yes to_syslog: yes logfile: /tmp/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } service { name: pacemaker ver: 0 } aisexec { user:root group: root } amf { mode: disabled } === Thanks Shravan ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Split Site 2-way clusters
On Thu, Jan 14, 2010 at 11:44 PM, Miki Shapiro miki.shap...@coles.com.au wrote: Confused. I *am* running DRBD in dual-master mode /me cringes... this sounds to me like an impossibly dangerous idea. Can someone from linbit comment on this please? Am I imagining this? (apologies, I should have mentioned that earlier), and there will be both WAN clients as well as local-to-datacenter-clients writing to both nodes on both ends. It’s safe to assume the clients will know not of the split. In a WAN split I need to ensure that the node whose idea of drbd volume will be kept once resync happens stays up, and node that’ll get blown away and re-synced/overwritten becomes dead asap. Won't you _always_ loose some data in a WAN split though? AFAICS, you're doing here is preventing some being lots. Is master/master really a requirement? NodeX(Successfully) taking on data from clients while in quorumless-freeze-still-providing-service, then discarding its hitherto collected client data when realizing other node has quorum and discarding own data isn’t good. Agreed - freeze isn't an option if you're doing master/master. To recap what I understood so far: 1. CRM Availability on the multicast channel drives DC election, but DC election is irrelevant to us here. 2. CRM Availability on the multicast channel (rather than resource failure) drive who-is-in-quorum-and-who-is-not decisions [not sure here.. correct? correct Or does resource failure drive quorum? ] quorum applies to node availability - resource failures have no impact (unless they lead to fencing with then leads to the node leaving the membership) 3. Steve to clarify what happens quorum-wise if 1/3 nodes sees both others, but the other two only see the first (“broken triangle”), and whether this behaviour may differ based on whether the first node (which is different as it sees both others) happens to be the DC at the time or not. Try in a cluster of 3 VMs? Just use iptables rules to simulate the broken links Given that anyone who goes about building a production cluster would want to identify all likely failure modes and be able to predict how the cluster behaves in each one, is there any user-targeted doco/rtfm material one could read regarding how quorum establishment works in such scenarios? I don't think corosync has such a doc at the moment. Setting up a 3-way with intermittent WAN links without getting a clear understanding in advance of how the software will behave is … scary J ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Pre-Announce: End of 0.6 support is near
On Tue, Jan 12, 2010 at 3:55 PM, Emmanuel Lesouef e.leso...@crbn.fr wrote: Le Tue, 12 Jan 2010 14:56:31 +0100, Michael Schwartzkopff mi...@multinet.de a écrit : Am Dienstag, 12. Januar 2010 14:48:12 schrieb Emmanuel Lesouef: Hi, We use a rather old (in fact, very old) combination : heartbeat 2.99 + openhpi 2.12 What do you suggest in order to upgrade to the latest version of pacemaker ? Thanks. http://www.clusterlabs.org/wiki/Upgrade Thanks for your answer. I already saw : http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ap-upgrade.html In fact, my question wans't about the upgrading process but more about polling this list about caveats, advices or best practice when dealing with rather old uncommon configuration. Biggest caveat is the networking issue that makes pacemaker 1.0 wire-incompatible with pacemaker 0.6 (and heartbeat 2.1.x). So rolling upgrades are out and you'd need to look at one of the other upgrade strategies. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] DC election with downed node in 2-way cluster
On Thu, Jan 14, 2010 at 4:40 AM, Miki Shapiro miki.shap...@coles.com.au wrote: And the node really did power down? Yes. 100% certain and positive. OFF. But the other node didn't notice?!? Its resources (drbd master and the fence clone) did notice. Its dc-election-mechanism did NOT notice (and the survivor didn't re-elect) Its quorum-election mechanism did NOT notice (and the survivor still thinks it has quorum). Logs attached. Hmmm. Not much to see there. crmd gets the membership event and then just sort of stops. Could you try again with debug turned on in openais.conf please? Keep in mind I'm relatively new to this. PEBKAC not entirely outside the realm of the possible ;) Doesn't look like it, but you might want to try something a little more recent than 1.0.3. Thanks! -Original Message- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: Wednesday, 13 January 2010 7:26 PM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] DC election with downed node in 2-way cluster On Wed, Jan 13, 2010 at 9:12 AM, Miki Shapiro miki.shap...@coles.com.au wrote: Halt = soft off - a natively issued poweroff command that shuts stuff down nicely, then powers the blade off. And the node really did power down? But the other node didn't notice?!? That is insanely bad - looking forward to those logs. Logs I'll send tomorrow (our timezone is just wrapping up for the day). Yep, I'm actually an Aussie too... just not living there at the moment :-) ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker __ This email and any attachments may contain privileged and confidential information and are intended for the named addressee only. If you have received this e-mail in error, please notify the sender and delete this e-mail immediately. Any confidentiality, privilege or copyright is not waived or lost because this e-mail has been sent to you in error. It is your responsibility to check this e-mail and any attachments for viruses. No warranty is made that this material is free from computer virus or any other defect or error. Any loss/damage incurred by using this material is not the sender's responsibility. The sender's entire liability will be limited to resupplying the material. __ ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Split Site 2-way clusters
On 2010-01-18 11:41, Colin wrote: Hi All, we are currently looking at nearly the same issue, in fact I just wanted to start a similarly titled thread when I stumbled over these messages… The setup we are evaluating is actually a 2*N-node-cluster, i.e. two slightly separated sites with N nodes each. The main difference to an N-node-cluster is that a failure of one of the two groups of nodes must be considered a single failure event [against which the cluster must protect, e.g. loss of power at one site]. Colin, the current approach is to utilize 2 Pacemaker clusters, each highly available in its own right, and employing manual failover. As described here: http://www.drbd.org/users-guide/s-pacemaker-floating-peers.html#s-pacemaker-floating-peers-site-fail-over May be combined with DRBD resource stacking, obviously. Given the fact that most organizations currently employ a non-automatic policy to site failover (as in, must be authorized by J. Random Vice President), this is a sane approach that works for most. Automatic failover is a different matter, not just with regard to clustering (where neither Corosync nor Pacemaker nor Heartbeat currently support any concept of sites), but also in terms of IP address failover, dynamic routing, etc. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Pre-Announce: End of 0.6 support is near
On 2010-01-18 11:18, Andrew Beekhof wrote: Biggest caveat is the networking issue that makes pacemaker 1.0 wire-incompatible with pacemaker 0.6 (and heartbeat 2.1.x). So rolling upgrades are out and you'd need to look at one of the other upgrade strategies. Even though I've bugged you about this repeatedly in the past, I'll reiterate that I think this non-support of rolling upgrades is a bad thing(tm). Just so someone puts this on the record. :) Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] [Linux-HA] Announce: Hawk (HA Web Konsole)
I look forward to taking this for a spin! Do we have a bugzilla component for it yet? On Sat, Jan 16, 2010 at 2:14 PM, Tim Serong tser...@novell.com wrote: Greetings All, This is to announce the development of the Hawk project, a web-based GUI for Pacemaker HA clusters. So, why another management tool, given that we already have the crm shell, the Python GUI, and DRBD MC? In order: 1) We have the usual rationale for a GUI over (or in addition to) a CLI tool; it is (or should be) easier to use, for a wider audience. 2) The Python GUI is not always easily installable/runnable (think: sysadmins with Windows desktops and/or people who don't want to, or can't, forward X). 3) Believe it or not, there are a number of cases where, citing security reasons, site policy prohibits ssh access to servers (which is what DRBD MC uses internally). There are also some differing goals; Hawk is not intended to expose absolutely everything. There will be point somewhere where you have to say and now you must learn to use a shell. Likewise, Hawk is not intended to install the base cluster stack for you (whereas DRBD MC does a good job of this). It's early days yet (no downloadable packages), but you can get the current source as follows: # hg clone http://hg.clusterlabs.org/pacemaker/hawk # cd hawk # hg update tip This will give you a web-based GUI with a display roughly analagous to crm_mon, in terms of status of cluster resources. It will show you running/dead/standby nodes, and the resources (clones, groups primitives) running on those nodes. It does not yet provide information about failed resources or nodes, other than the fact that they are not running. Display of nodes resources is collapsible (collapsed by default), but if something breaks while you are looking at it, the display will expand to show the broken nodes and/or resources. Hawk is intended to run on each node in your cluster. You can then access it by pointing your web browser at the IP address of any cluster node, or the address of any IPaddr(2) resource you may have configured. Minimally, to see it in action, you will need the following packages and their dependencies (names per openSUSE/SLES): - ruby - rubygem-rails-2_3 - rubygem-gettext_rails Once you've got those installed, run the following command: # hawk/script/server Then, point your browser at http://your-server:3000/ to see the status of your cluster. Ultimately, hawk is intended to be installed and run as a regular system service via /etc/init.d/hawk. To do this, you will need the following additional packages: - lighttpd - lighttpd-mod_magnet - ruby-fcgi - rubygem-rake Then, try the following, but READ THE MAKEFILE FIRST! make install (and the rest of the build system for that matter) is frightfully primitive at the moment: # make # sudo make install # /etc/init.d/hawk start Then, point your browser at http://your-server:/ to see the status of your cluster. Assuming you've read this far, what next? - In the very near future (but probably not next week, because I'll be busy at linux.conf.au) you can expect to see further documentation and roadmap info up on the clusterlabs.org wiki. - Immediate goal is to obtain feature parity with crm_mon (completing status display, adding error/failure messages). - Various pieces of scaffolding need to be put in place (login page, access via HTTPS, clean up build/packaging, theming, etc.) - After status display, the following major areas of funcionality are: - Basic operator tasks (stop/start/migrate resource, standby/online node, etc.) - Explore failure scenarios (shadow CIB magic to see what would happen if a node/resource failed). - Ability to actually configure resources and nodes. Please direct comments, feedback, questions, etc. to tser...@novell.com and/or the Pacemaker mailing list. Thank you for your attention. Regards, Tim -- Tim Serong tser...@novell.com Senior Clustering Engineer, Novell Inc. ___ Linux-HA mailing list linux...@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] Announce: Pacemaker 1.0.7 (stable) Released
The latest installment of the Pacemaker 1.0 stable series is now ready for general consumption. In this release, we’ve made a number improvements to clone handling - particularly the way ordering constraints are processed - as well as some really nice improvements to the shell. The next 1.0 release is anticipated to be in mid-March. We will be switching to a bi-monthly release schedule to begin focusing on development for the next stable series (more details soon). So, if you have feature requests, now is the time to voice them and/or provide patches :-) Pre-built packages for Pacemaker and it’s immediate dependancies are currently building and will be available for openSUSE, SLES, Fedora, RHEL, CentOS from the ClusterLabs Build Area (http://www.clusterlabs.org/rpm) shortly. Read the full announcement at: http://theclusterguy.clusterlabs.org/post/340780359/pacemaker-1-0-7-released General installation instructions are available at from the ClusterLabs wiki: http://clusterlabs.org/wiki/Install -- Andrew ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Split Site 2-way clusters
On Mon, Jan 18, 2010 at 11:52 AM, Florian Haas florian.h...@linbit.com wrote: the current approach is to utilize 2 Pacemaker clusters, each highly available in its own right, and employing manual failover. As described here: Thanks for the pointer! Perhaps site is not quite the correct term for our setup, where we still have (multiple) Gbit-or-faster ethernet links, think fire areas, at most in adjacent buildings. For the next step up, two geographically different sites, I agree that manual failover is more appropriate, but we feel that our case of the fire areas should still be handled automatically…(?) Can anybody judge how difficult it would be to integrate some kind of quorum-support into the cluster? (All cluster nodes attempt a quorum reservation; the node that gets it, has 1.5 or 2 votes towards the quorum, rather than just one; this would ensure continued operation in the case of a) a fire area losing power, b) the separate quorum-server failing, or c) the cross-fire-area cluster-interconnects failing (but not more than one failure at a time)…) Regards, Colin ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Announce: Pacemaker 1.0.7 (stable) Released
-Ursprüngliche Nachricht- Von: Andrew Beekhof and...@beekhof.net Gesendet: 18.01.10 12:43:30 An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Betreff: [Pacemaker] Announce: Pacemaker 1.0.7 (stable) Released The latest installment of the Pacemaker 1.0 stable series is now ready for general consumption. Great. Pre-built packages for Pacemaker and its immediate dependancies are currently building and will be available for openSUSE, SLES, Fedora, RHEL, CentOS from the ClusterLabs Build Area (http://www.clusterlabs.org/rpm) shortly. Please don't forget openSuSE 10.2. I'm waiting... ;-) Best regards + Thanks Andreas ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] errors in corosync.log
Hi , Since the interfaces on the two nodes are connected via cross over cable so there is no chance of that happening and since I'm using rrp: passive, which means that the other ring i.e. ring 1 will come into play only when ring 0 fails,I assume. I say this because ring 1 interface is on the network. Once interesting that I observed was that lintomcrypt is being used for crypto reasons because I have secauth: on. But I couldn't find that library on my machine. I'm wondering if it's because of that. Basically we are using 3 interfaces eth0, eth1 and eth2. eth0 and eth2 are for ring 0 and ring 1 respectively. eth1 is the primary interface. This is what my drbd.conf looks like: == # please have a a look at the example configuration file in # /usr/share/doc/drbd82/drbd.conf # global { usage-count no; } common { protocol C; startup { wfc-timeout 120; degr-wfc-timeout 120; } } resource var_nsm { syncer { rate 333M; } handlers { fence-peer /usr/lib/drbd/crm-fence-peer.sh; after-resync-target /usr/lib/drbd/crm-unfence-peer.sh; } net { after-sb-1pri discard-secondary; } on node1.itactics.com { device /dev/drbd1; disk /dev/sdb3; address 172.20.20.1:7791; meta-disk internal; } on node2.itactics.com { device /dev/drbd1; disk /dev/sdb3; address 172.20.20.2:7791; meta-disk internal; } } = eth0's of the two nodes are connected via cross over as I mentioned and eth1 and eth2 are on the network. I'm not a networking expert but is it possible that broadcast done by ,let's say, any node not in my cluster, will still cause it to come to my nodes through other interfaces which are attached to the network? We in the dev and the QA guys are testing this in parallel. And let's say there is QA cluster of two nodes and dev cluster of 2 nodes. And interfaces for both of them are hooked as I mentioned above and that corosync.conf for both the clusters have bindnetaddr: 192.168.2.0. Is there possibility of bad messages for the cluster casused by the other. We are in the final leg of the testing and this came up. Thanks for the help. Shravan On Mon, Jan 18, 2010 at 2:58 AM, Andrew Beekhof and...@beekhof.net wrote: On Sat, Jan 16, 2010 at 9:20 PM, Shravan Mishra shravan.mis...@gmail.com wrote: Hi Guys, I'm running the following version of pacemaker and corosync corosync=1.1.1-1-2 pacemaker=1.0.9-2-1 Every thing had been running fine for quite some time now but then I started seeing following errors in the corosync logs, = Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid digest... ignoring. Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid digest... ignoring. Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid digest... ignoring. Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data I can perform all the crm shell commands and what not but it's troubling that the above is happening. My crm_mon output looks good. I also checked the authkey and did md5sum on both it's same. Then I stopped corosync and regenerated the authkey with corosync-keygen and copied it to the the other machine but I still get the above message in the corosync log. Are you sure there's not a third node somewhere broadcasting on that mcast and port combination? Is there anything other authkey that I should look into ? corosync.conf # Please read the corosync.conf.5 manual page compatibility: whitetank totem { version: 2 token: 3000 token_retransmits_before_loss_const: 10 join: 60 consensus: 1500 vsftype: none max_messages: 20 clear_node_high_bit: yes secauth: on threads: 0 rrp_mode: passive interface { ringnumber: 0 bindnetaddr: 192.168.2.0 #mcastaddr: 226.94.1.1 broadcast: yes mcastport: 5405 } interface { ringnumber: 1 bindnetaddr: 172.20.20.0 #mcastaddr: 226.94.1.1 broadcast: yes mcastport: 5405 } } logging { fileline: off to_stderr: yes to_logfile: yes to_syslog: yes logfile: /tmp/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } service { name: pacemaker ver: 0 } aisexec { user:root group:
[Pacemaker] mcast vs broadcast
Hi all, Following is my corosync.conf. Even though broadcast is enabled I see mcasted messages like these in corosync.log. Is it ok? even when the broadcast is on and not mcast. == Jan 18 09:50:40 corosync [TOTEM ] mcasted message added to pending queue Jan 18 09:50:40 corosync [TOTEM ] mcasted message added to pending queue Jan 18 09:50:40 corosync [TOTEM ] Delivering 171 to 173 Jan 18 09:50:40 corosync [TOTEM ] Delivering MCAST message with seq 172 to pending delivery queue Jan 18 09:50:40 corosync [TOTEM ] Delivering MCAST message with seq 173 to pending delivery queue Jan 18 09:50:40 corosync [TOTEM ] Received ringid(192.168.2.1:168) seq 172 Jan 18 09:50:40 corosync [TOTEM ] Received ringid(192.168.2.1:168) seq 172 Jan 18 09:50:40 corosync [TOTEM ] Received ringid(192.168.2.1:168) seq 173 Jan 18 09:50:40 corosync [TOTEM ] Received ringid(192.168.2.1:168) seq 173 Jan 18 09:50:40 corosync [TOTEM ] releasing messages up to and including 172 Jan 18 09:50:40 corosync [TOTEM ] releasing messages up to and including 173 = === # Please read the corosync.conf.5 manual page compatibility: whitetank totem { version: 2 token: 3000 token_retransmits_before_loss_const: 10 join: 60 consensus: 1500 vsftype: none max_messages: 20 clear_node_high_bit: yes secauth: on threads: 0 rrp_mode: passive interface { ringnumber: 0 bindnetaddr: 192.168.2.0 # mcastaddr: 226.94.1.1 broadcast: yes mcastport: 5405 } interface { ringnumber: 1 bindnetaddr: 172.20.20.0 #mcastaddr: 226.94.2.1 broadcast: yes mcastport: 5405 } } logging { fileline: off to_stderr: yes to_logfile: yes to_syslog: yes logfile: /tmp/corosync.log debug: on timestamp: on logger_subsys { subsys: AMF debug: off } } service { name: pacemaker ver: 0 } aisexec { user:root group: root } amf { mode: disabled } = Thanks Shravan ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] 1.0.7 upgraded, restarting resources problem
Hi, i have one m/s drbd resource and one Xen instance on top. Both m/s are primary. When i restart node that's _not_ hosting the Xen instance (ibm1), pacemaker restarts running Xen instance on the other node (ibm2). There is no need to do that. I thought it got fixed (http://developerbugs.linux-foundation.org/show_bug.cgi?id=2153). Didn't it? Here is my config once more. Please note the WARNING showed up only after upgrade. (BTW setting drbd0predHosting score to 0 doesn't restart it. But it doesn't help resource ordering either.) [r...@ibm1 etc]# crm configure show WARNING: notify: operation name not recognized node $id=3d430f49-b915-4d52-a32b-b0799fa17ae7 ibm2 node $id=4b2047c8-f3a0-4935-84a2-967b548598c9 ibm1 primitive Hosting ocf:heartbeat:Xen \ params xmfile=/etc/xen/Hosting.cfg shutdown_timeout=303 \ meta target-role=Started allow-migrate=true is-managed=true \ op monitor interval=120s timeout=506s start-delay=5s \ op migrate_to interval=0s timeout=304s \ op migrate_from interval=0s timeout=304s \ op stop interval=0s timeout=304s \ op start interval=0s timeout=202s primitive drbd_r0 ocf:linbit:drbd \ params drbd_resource=r0 \ op monitor interval=15s role=Master timeout=30s \ op monitor interval=30s role=Slave timeout=30s \ op stop interval=0s timeout=501s \ op notify interval=0s timeout=90s \ op demote interval=0s timeout=90s \ op promote interval=0s timeout=90s \ op start interval=0s timeout=255s ms ms_drbd_r0 drbd_r0 \ meta notify=true master-max=2 inteleave=true is-managed=true target-role=Started order drbd0predHosting inf: ms_drbd_r0:promote Hosting:start property $id=cib-bootstrap-options \ dc-version=1.0.7-b1191b11d4b56dcae8f34715d52532561b875cd5 \ cluster-infrastructure=Heartbeat \ stonith-enabled=false \ no-quorum-policy=ignore \ default-resource-stickiness=10 \ last-lrm-refresh=1263845352 All i want is to have just one resource Hosting started, after drbd was promoted(/primary) on the node that's it's starting. Please advise me if you can. Thank you, regards, M. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] errors in corosync.log
One possibility is you have a different cluster in your network on the same multicast address and port. Regards -steve On Sat, 2010-01-16 at 15:20 -0500, Shravan Mishra wrote: Hi Guys, I'm running the following version of pacemaker and corosync corosync=1.1.1-1-2 pacemaker=1.0.9-2-1 Every thing had been running fine for quite some time now but then I started seeing following errors in the corosync logs, = Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid digest... ignoring. Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid digest... ignoring. Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid digest... ignoring. Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data I can perform all the crm shell commands and what not but it's troubling that the above is happening. My crm_mon output looks good. I also checked the authkey and did md5sum on both it's same. Then I stopped corosync and regenerated the authkey with corosync-keygen and copied it to the the other machine but I still get the above message in the corosync log. Is there anything other authkey that I should look into ? corosync.conf # Please read the corosync.conf.5 manual page compatibility: whitetank totem { version: 2 token: 3000 token_retransmits_before_loss_const: 10 join: 60 consensus: 1500 vsftype: none max_messages: 20 clear_node_high_bit: yes secauth: on threads: 0 rrp_mode: passive interface { ringnumber: 0 bindnetaddr: 192.168.2.0 #mcastaddr: 226.94.1.1 broadcast: yes mcastport: 5405 } interface { ringnumber: 1 bindnetaddr: 172.20.20.0 #mcastaddr: 226.94.1.1 broadcast: yes mcastport: 5405 } } logging { fileline: off to_stderr: yes to_logfile: yes to_syslog: yes logfile: /tmp/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } service { name: pacemaker ver: 0 } aisexec { user:root group: root } amf { mode: disabled } === Thanks Shravan ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] mcast vs broadcast
On Mon, 2010-01-18 at 11:25 -0500, Shravan Mishra wrote: Hi all, Following is my corosync.conf. Even though broadcast is enabled I see mcasted messages like these in corosync.log. Is it ok? even when the broadcast is on and not mcast. Yes you are using broadcast and the debug output doesn't print a special case for broadcast (but it really is broadcasting). This output is debug output meant for developer consumption. It is really not all that useful for end users. == Jan 18 09:50:40 corosync [TOTEM ] mcasted message added to pending queue Jan 18 09:50:40 corosync [TOTEM ] mcasted message added to pending queue Jan 18 09:50:40 corosync [TOTEM ] Delivering 171 to 173 Jan 18 09:50:40 corosync [TOTEM ] Delivering MCAST message with seq 172 to pending delivery queue Jan 18 09:50:40 corosync [TOTEM ] Delivering MCAST message with seq 173 to pending delivery queue Jan 18 09:50:40 corosync [TOTEM ] Received ringid(192.168.2.1:168) seq 172 Jan 18 09:50:40 corosync [TOTEM ] Received ringid(192.168.2.1:168) seq 172 Jan 18 09:50:40 corosync [TOTEM ] Received ringid(192.168.2.1:168) seq 173 Jan 18 09:50:40 corosync [TOTEM ] Received ringid(192.168.2.1:168) seq 173 Jan 18 09:50:40 corosync [TOTEM ] releasing messages up to and including 172 Jan 18 09:50:40 corosync [TOTEM ] releasing messages up to and including 173 = === # Please read the corosync.conf.5 manual page compatibility: whitetank totem { version: 2 token: 3000 token_retransmits_before_loss_const: 10 join: 60 consensus: 1500 vsftype: none max_messages: 20 clear_node_high_bit: yes secauth: on threads: 0 rrp_mode: passive interface { ringnumber: 0 bindnetaddr: 192.168.2.0 # mcastaddr: 226.94.1.1 broadcast: yes mcastport: 5405 } interface { ringnumber: 1 bindnetaddr: 172.20.20.0 #mcastaddr: 226.94.2.1 broadcast: yes mcastport: 5405 } } logging { fileline: off to_stderr: yes to_logfile: yes to_syslog: yes logfile: /tmp/corosync.log debug: on timestamp: on logger_subsys { subsys: AMF debug: off } } service { name: pacemaker ver: 0 } aisexec { user:root group: root } amf { mode: disabled } = Thanks Shravan ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker