Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Andrew Beekhof wrote: On Mar 22, 2007, at 2:13 AM, Alan Robertson wrote: Doug Knight wrote: Hi Andrew, I had just started reviewing both of thes scripts, and reviewed the Multistate and clone resource pages on the web site. It looks like multistate is how I need to handle it, but a couple of questions first. 1. I noticed that the write-up says the resource must come up on each of the servers in shadow mode first, then one gets promoted. Does this imply a start on both servers, and the OCF start function determining which server is active vs shadow (I'm picturing a check in the OCF script to determine postgresql standby mode = shadow/crm_master value low, and postgresql active mode = active/crm_master value high), then a promote to the active server? 2. I noticed that the drbd OCF script contains a notify function, where the Stateful OCF script does not. The notify function looks to be where the important actions are taken (calling drbd_start_phase_2, pre/post, etc). Is the notify function necessary, or is it sufficient in my case to handle it through the start|stop|promote|demote functions? Thanks for your help, Doug Andrew's out for a while. The start function starts you up in slave/secondary mode. All resources initially start up in slave mode. A set of servers is chosen to run the resources on (it might be one, two, the whole set, etc. depending on clone_max and clone_node_max and the usual constraints). They are started on the selected nodes using start During the start operation, you are given the chance to declare yourself ready to become master or not by using the crm_master command line tool. I believe that your resource can run that command any time they like - for example at a monitor operation... But, it is mandatory that they run it when they first start up. mandatory in the sense that nothing will get promoted until someone, somewhere runs it. but the exact timing is completely up to the user/admin/RA... it is even possible to run it manually if you have to I originally assumed what you said, but the docs contradict that by calling it mandatory (and not qualifying the term). And the code seems to indicate that you can ONLY run it from an RA. -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-HA] Heartbeat compatibility question
Dejan Muhamedagic wrote: Hmm, are you sure that your hardware is good and that it is well supported under Linux? Haven't you been able to find the reason for your computers crashing so often? BTW, you can always try the vanilla kernel and then bother people on the kernel list ;-) So many tests run on the hardware for these 2 years... - hard drive checked (with constructor softwar) OK. New low level initialisation and re-install. Same problem. - memory test for 72 hours with the latest memtest version. No error. - mother board change. Same problem. - Power supply change. Same problem. - Adding additionnal fan to the unit (even if the room is maintained at 22 degres). Same problem. As the problem occurs on the 2 hosts but not simultaneously (usualy) I think it is a software problem. But I cannot get any info when the host crashes (nothing in the logs and only partial info on the console). May be I can again try a new kernel before removing Debian... Patrick -- === | Equipe M.O.S.T. | http://most.hmg.inpg.fr | | Patrick BEGOU | | | LEGI| mailto:[EMAIL PROTECTED] | | BP 53 X | Tel 04 76 82 51 35 | | 38041 GRENOBLE CEDEX| Fax 04 76 82 52 71 | === ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Virtual IP alias gets incremented
Hi All, I am running Heartbeat 1.2.3-2 on Fedora Core 3 with 2.6.11-1.35_FC3smp kernel. The problem is, when I do failover multiple times, the virtual interface alias eth0:1 becomes eth0:2 . It keeps on increasing by one number on each failover. I have seen virtual alias as big as eth0:12 Any help or pointers would be great. Thanks Abdul Khader ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Documentation for constraints
On Mar 24, 2007, at 2:03 PM, Dejan Muhamedagic wrote: On Fri, Mar 23, 2007 at 02:08:17PM +0100, Andrew Beekhof wrote: On Mar 22, 2007, at 7:40 PM, Dejan Muhamedagic wrote: On Thu, Mar 22, 2007 at 06:15:41PM +0100, Ragnar Kj?rstad wrote: On Thu, Mar 22, 2007 at 02:05:57PM +0100, Dejan Muhamedagic wrote: On Thu, Mar 22, 2007 at 02:21:23AM +0100, Ragnar Kj?rstad wrote: BTW: I just noticed that the DTD is also checked into mercury. So which is the authoriative version? mercury or wiki? i think it's wiki, but i'm really not sure. alan? mercurial is right. is it possible to edit the dtd in wiki? it shouldn't be. The explanation for symmetrical is also not good enough, because it's not clear what the reverse constraint is. sounds clear enough to me, but then it's only me :) The unclear part is that there is 3 things that can be reversed. x vs y, start vs stop or before vs after. the first and the last don't make sense. I've updated the DTD to specifically state that it is the action and type that is reversed in the symmetrical rule. [...] http://wiki.linux-ha.org/v2/dtd1.0/annotated#head-390c0a5ecce666978dab397e20d1575ff366c262 needs to explain if x colocated with y means the same thing as y colocated with x or not. i believe that it should, but i'm not really sure. anybody? Is it correct that rsc_colocation id=foo from=x to=y score=INFINITY/ affects the score of x, but not y. Does that mean that y will be placed first, and x will follow? no. it just means that they stay together. What if y is placed on a node where x fails to start, will both x and y be migrated to another node? yes. there were some changes though in 2.0.8 to make heartbeat more configurable here and to make it possible to stop one resource and leave the other in the started state, given that the order constraint is right. Interesting. Any more information? The changelog includes + Support weak and uni-directional collocation constraints (FATE 300792). but I see no mention of weak or uni-directional in the DTD or anywhere else in the documentation. unfortunately our master of crm is not available right now. i'll try to explain what it is about: a group of resources or an ordered set of resources used to behave like a unit. one of them goes down, the others follow. that hasn't been quite right, because one would expect those resources which are under (in terms of order, started before) to keep running. in other words, a colocation should mean that the two things, if they run, must run on the same node, but it should not imply that they depend on each other. that was particularly important for non-grouped resources, because one common scenario is that two resources which are independent of each other both depend on a third one. i'm afraid that the documentation hasn't been updated. What is the best practise use of colocation? not sure if i understand this. basically, you just use them where you need them. hmm, that sounds stupid, but don't see any other way to say it :) I should probably be more explisit: In an example with resource A(filesystem), B(database), C(webserver) where A must be started before B before C on the same host. What would the natural colocation rules be? you're talking about order here. one can only assume that ABC should run on the same host. I assume one colocation rule between A and B. Does the direction matter? no. it does A - B also says that B can run if A cant, but if B cant run then A cant either. oops. that's overloading collocations, isn't it. no more than previously where the reverse was _also_ true (if A cant run then neither can B) prior to 2.0.8, colocate(A, B, INFINITY) meant that if A or B couldn't run somewhere... neither could because the location that they shared was nowhere. this wasn't acceptable to most people, and so the new semantics were created that allowed B to keep running. i thought that the this can run independently of the other would have been governed by an additional ordering constraint. sth like: - A and B are collocated - A starts before B then we can assume that B depends on A to run. wouldn't that be more logical? can you give one example where
Re: [Linux-HA] Virtual IP alias gets incremented
Abdul Khader wrote: Hi All, I am running Heartbeat 1.2.3-2 on Fedora Core 3 with 2.6.11-1.35_FC3smp kernel. The problem is, when I do failover multiple times, the virtual interface alias eth0:1 becomes eth0:2 . It keeps on increasing by one number on each failover. I have seen virtual alias as big as eth0:12 Any help or pointers would be great. There's something on your system causing that. What it is I couldn't say. But we do track which aliases are in use, and make sure we don't grab one that's already in use. Something in your environment is interfering with that logic. -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] haclient.py requires a port to login ?
Karl Hanzel wrote: 'Just upgraded to heartbeat-2.0.8-2.el4.centos, and it's new companion RPMs. 'Found the new haclient.py under /usr/lib64/heartbeat-gui/. It starts up fine, but upon login to my running cluster and supplying these args to the Login window: 127.0.0.1 / hacluster / hacluster's_pw (which always worked previously), i'm now getting: -- Traceback (most recent call last): File /usr/lib64/heartbeat-gui/haclient.py, line 1598, in on_login if not manager.login(server, user, password): File /usr/lib64/heartbeat-gui/haclient.py, line 1943, in login ret = mgmt_connect(ip, username, password, port) TypeError: mgmt_connect() argument 4 must be string, not None -- ...and it fails to login/connect. If in the Server(:port): field of the Login window i specify 127.0.0.1:xyz it makes the login without the above complaint. That xyz can be literal... i seem to be able to supply anything (including a null string) there. So what's the rub ... why do i/we now have to specify a port? And if we do, what's the appropriate port to specify? You don't have to supply a port... You _can_ specify a port if you like... You can check out the screencast here, and see what I mean... http://linux-ha.org/Education/Newbie/IPaddrScreencast In other words It Works For Me (tm) ;-) -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How to set this up correctly...?
Howard Yuan wrote: Why does the service die when the crossover breaks? Because I was planning on telling the services to look for MySQL on 10.0.0.5 (the floating IP) and if the crossover link breaks, the systems don't know how to get to 10.0.0.5 as they're looking to find it via the crossover link (LAN 2). OK. I understand that. So, mysql needs connectivity via the crossover cable. I need more configuration information to give a detailed answer. Hum...what information would you need to understand it better? I can try to draw an ASCII picture of the network diagram if you need me to. No, I think that was enough for now. But, the short answer is 'run an R2 configuration with pingd properly configured for your problem' I looked at this for awhile and I can't figure out what you mean by R2. I mean a crm yes configuration. Also, on my heartbeat configuration right now, i'm using crm no to use ipfail, as I found on the mailing list that someone said that ipfail doesn't work with crm. Does CRM include a replacement for ipfail that works better? It's called pingd ;-). I'm not sure that either toolset precisely addresses your problem. What you likely really want is this: If one node is up, run all services there. If both nodes are up, and the two nodes can't talk across the crossover, then: run both services on the machine with better connectivity to your clients It's the dual-connectivity test that you'd really like to have that pingd won't really handle. I believe that pingd treats all ping nodes the same. But, to truly solve this problem, you need to treat outside ping nodes differently from inside ping nodes. You _can_ solve the problem in R2, you'll just have to write your own pingd replacement - since it doesn't have to be general, it'll be easy enough to write, but you'll still have to do it... You could do it all in the shell if you like... Then you only tell heartbeat about one of the sets of ping nodes, and not the other set, and your tool would manage the other set. But, none of this is likely to make much sense to you unless you understand the CRM's way of doing things through the rules in the CIB. This is explained in some detail in my tutorial on R2. It's the newest tutorial in the http://linux-ha.org/HeartbeatTutorials page on the web site. The relevant section is slides 137-145. If you haven't used R2 at all, then maybe reviewing the presentation from the beginning would be good. There is a 90 minute video covering some basic things -- given from these slides - and it's linked to from that same page. If you have trouble viewing the video directly, then try the abstract page - it has an embedded video viewer in Java at the bottom of the web page. -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Monitoring resources
Hi i have Debian Etch, with: - heartbeat 2.0.7-2 - drbd0.7 and i need to monitorice postfix, apache2, ldap and courier, i already have the cluster configurated and it works with out problems, but i need but i need to active a monitor. I saw in the linux-ha.org about CRM, but i do not know how to do it. There is a page with some details about the implementation? Thanks a lot !!! Michael.- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] ip address resources faile to start
Every time I go to start heartbeat I get an error stating ip addresource is stopped and as a result heartbeat fails to start the Virtual IP address haresources looks like testa 172.16.9.148 drbddisk::drbd0 Filesystem::/dev/drbd0::/transactions::ext3 ha.cf looks like crm false auto_failback off watchdog /dev/watchdog nodetesta nodetestb ping 172.16.9.161 deadping 30 debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 30 warntime 10 initdead 120 udpport 694 bcast eth0 eth1 # Linux any ideas appreciated ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Masters take long time to get back the ip from slave
I have configure ha with latest ver. When my master gets down slave auto take ip in 1 sec. which i have mention in my ha.cf file. But when master get`s back to work it takes around 40-50 seconds to take over the ip. When can i mention that second so that when master gets up it will take in 1 sec. only. Thanks. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] haclient.py requires a port to login ?
Paulo F. Andrade wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Actually a similar thing happens on my Mac, the diference is that I do have to specify the correct port to connect. Anything else won't do. On my Linux systems (running ubuntu and gentoo) I don't have to specify the port number, and it defaults to 5560. Here's the output when I don't specify a port number: Traceback (most recent call last): File /sw/lib/heartbeat-gui/haclient.py, line 1598, in on_login if not manager.login(server, user, password): File /sw/lib/heartbeat-gui/haclient.py, line 1943, in login ret = mgmt_connect(ip, username, password, port) TypeError: mgmt_connect() argument 4 must be string, not None It's no big deal, but it must be a bug. Paulo F. Andrade [EMAIL PROTECTED] Could one of you kindly make a bugzilla for this for me please? -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems