Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-27 Thread Alan Robertson
Andrew Beekhof wrote:
 
 On Mar 22, 2007, at 2:13 AM, Alan Robertson wrote:
 
 Doug Knight wrote:
 Hi Andrew,
 I had just started reviewing both of thes scripts, and reviewed the
 Multistate and clone resource pages on the web site. It looks like
 multistate is how I need to handle it, but a couple of questions first.

 1. I noticed that the write-up says the resource must come up on each of
 the servers in shadow mode first, then one gets promoted. Does this
 imply a start on both servers, and the OCF start function determining
 which server is active vs shadow (I'm picturing a check in the OCF
 script to determine postgresql standby mode = shadow/crm_master value
 low, and postgresql active mode = active/crm_master value high), then a
 promote to the active server?

 2. I noticed that the drbd OCF script contains a notify function,
 where the Stateful OCF script does not. The notify function looks to be
 where the important actions are taken (calling drbd_start_phase_2,
 pre/post, etc). Is the notify function necessary, or is it sufficient in
 my case to handle it through the start|stop|promote|demote functions?

 Thanks for your help,
 Doug

 Andrew's out for a while.

 The start function starts you up in slave/secondary mode.  All resources
 initially start up in slave mode.

 A set of servers is chosen to run the resources on (it might be one,
 two, the whole set, etc. depending on clone_max and clone_node_max and
 the usual constraints).

 They are started on the selected nodes using start

 During the start operation, you are given the chance to declare yourself
 ready to become master or not by using the crm_master command line tool.

 I believe that your resource can run that command any time they like -
 for example at a monitor operation...  But, it is mandatory that they
 run it when they first start up.
 
 mandatory in the sense that nothing will get promoted until someone,
 somewhere runs it.
 but the exact timing is completely up to the user/admin/RA... it is even
 possible to run it manually if you have to

I originally assumed what you said, but the docs contradict that by
calling it mandatory (and not qualifying the term).  And the code seems
to indicate that you can ONLY run it from an RA.

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-HA] Heartbeat compatibility question

2007-03-27 Thread Patrick Begou

Dejan Muhamedagic wrote:
 

Hmm, are you sure that your hardware is good and that it is well
supported under Linux? Haven't you been able to find the reason
for your computers crashing so often? BTW, you can always try the
vanilla kernel and then bother people on the kernel list ;-)



So many tests run on the hardware for these 2 years...
- hard drive checked (with constructor softwar) OK. New low level 
initialisation and re-install. Same problem.

- memory test for 72 hours with the latest memtest version. No error.
- mother board change. Same problem.
- Power supply change. Same problem.
- Adding additionnal fan to the unit (even if the room is maintained at 
22 degres). Same problem.


As the problem occurs on the 2 hosts but not simultaneously (usualy) I 
think it is a software problem.  But I cannot get any info when the host 
crashes (nothing in the logs and only partial info on the console).


May be I can again try a new kernel before removing Debian...

Patrick
--
===
|  Equipe M.O.S.T. | http://most.hmg.inpg.fr  |
|  Patrick BEGOU   |      |
|  LEGI| mailto:[EMAIL PROTECTED] |
|  BP 53 X | Tel 04 76 82 51 35   |
|  38041 GRENOBLE CEDEX| Fax 04 76 82 52 71   |
===
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Virtual IP alias gets incremented

2007-03-27 Thread Abdul Khader

Hi All,
I am running Heartbeat 1.2.3-2 on Fedora Core 3 with  2.6.11-1.35_FC3smp 
kernel. The problem is, when I do  failover multiple times, the virtual 
interface alias eth0:1 becomes eth0:2 . It keeps on increasing by one 
number on each failover. I have seen virtual alias as big as eth0:12


Any help or pointers would be great.

Thanks
Abdul Khader

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Documentation for constraints

2007-03-27 Thread Andrew Beekhof


On Mar 24, 2007, at 2:03 PM, Dejan Muhamedagic wrote:


On Fri, Mar 23, 2007 at 02:08:17PM +0100, Andrew Beekhof wrote:


On Mar 22, 2007, at 7:40 PM, Dejan Muhamedagic wrote:



On Thu, Mar 22, 2007 at 06:15:41PM +0100, Ragnar Kj?rstad wrote:





On Thu, Mar 22, 2007 at 02:05:57PM +0100, Dejan Muhamedagic wrote:













On Thu, Mar 22, 2007 at 02:21:23AM +0100, Ragnar Kj?rstad wrote:











BTW: I just noticed that the DTD is also checked into mercury. So





which





is the authoriative version? mercury or wiki?







i think it's wiki, but i'm really not sure. alan?


mercurial is


right. is it possible to edit the dtd in wiki? it shouldn't be.


















The explanation for symmetrical is also not good enough, because









it's























not clear what the reverse constraint is.



























sounds clear enough to me, but then it's only me :)











The unclear part is that there is 3 things that can be reversed.





x vs y, start vs stop or before vs after.







the first and the last don't make sense.








I've updated the DTD to specifically state that it is the action  
and





type that is reversed in the symmetrical rule.











[...]





























http://wiki.linux-ha.org/v2/dtd1.0/annotated#head-390c0a5ecce666978dab397e20d1575ff366c262














































needs to explain if x colocated with y means the same thing  
as y























colocated with x or not.



























i believe that it should, but i'm not really sure. anybody?





































Is it correct that























rsc_colocation id=foo from=x to=y score=INFINITY/























affects the score of x, but not y.















































Does that mean that y will be placed first, and x will follow?



























no. it just means that they stay together.





































What if y is placed on a node where x fails to start, will both x









and y























be migrated to another node?



























yes. there were some changes though in 2.0.8 to make heartbeat













more configurable here and to make it possible to stop one













resource and leave the other in the started state, given that the













order constraint is right.











Interesting. Any more information?





The changelog includes





+ Support weak and uni-directional collocation constraints (FATE





 300792).





but I see no mention of weak or uni-directional in the DTD or





anywhere else in the documentation.







unfortunately our master of crm is not available right now. i'll



try to explain what it is about: a group of resources or an



ordered set of resources used to behave like a unit. one of them



goes down, the others follow. that hasn't been quite right,



because one would expect those resources which are under (in



terms of order, started before) to keep running. in other words, a



colocation should mean that the two things, if they run, must run



on the same node, but it should not imply that they depend on each



other. that was particularly important for non-grouped resources,



because one common scenario is that two resources which are



independent of each other both depend on a third one.







i'm afraid that the documentation hasn't been updated.



























What is the best practise use of colocation?



























not sure if i understand this. basically, you just use them where













you need them. hmm, that sounds stupid, but don't see any other













way to say it :)











I should probably be more explisit:




In an example with resource A(filesystem), B(database),  
C(webserver)





where A must be started before B before C on the same host. What





would





the natural colocation rules be?







you're talking about order here. one can only assume that ABC



should run on the same host.









I assume one colocation rule between A and B. Does the direction





matter?







no.


it does

A - B also says that B can run if A cant, but if B cant run then A
cant either.


oops. that's overloading collocations, isn't it.


no more than previously where the reverse was _also_ true (if A cant  
run then neither can B)


prior to 2.0.8, colocate(A, B, INFINITY) meant that if A or B couldn't  
run somewhere... neither could because the location that they  
shared was nowhere.


this wasn't acceptable to most people, and so the new semantics were  
created that allowed B to keep running.



i thought that the
this can run independently of the other would have been governed
by an additional ordering constraint. sth like:

- A and B are collocated
- A starts before B

then we can assume that B depends on A to run. wouldn't that be
more logical?


can you give one example where 

Re: [Linux-HA] Virtual IP alias gets incremented

2007-03-27 Thread Alan Robertson
Abdul Khader wrote:
 Hi All,
 I am running Heartbeat 1.2.3-2 on Fedora Core 3 with  2.6.11-1.35_FC3smp
 kernel. The problem is, when I do  failover multiple times, the virtual
 interface alias eth0:1 becomes eth0:2 . It keeps on increasing by one
 number on each failover. I have seen virtual alias as big as eth0:12
 
 Any help or pointers would be great.

There's something on your system causing that.  What it is I couldn't
say.  But we do track which aliases are in use, and make sure we don't
grab one that's already in use.

Something in your environment is interfering with that logic.


-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] haclient.py requires a port to login ?

2007-03-27 Thread Alan Robertson
Karl Hanzel wrote:
 
 'Just upgraded to heartbeat-2.0.8-2.el4.centos, and it's new companion
 RPMs.
 
 'Found the new haclient.py under /usr/lib64/heartbeat-gui/.  It starts
 up fine, but upon login to my running cluster and supplying these args
 to the Login window: 127.0.0.1 / hacluster / hacluster's_pw (which
 always worked previously), i'm now getting:
 
 -- 
   Traceback (most recent call last):
 File /usr/lib64/heartbeat-gui/haclient.py, line 1598, in on_login
   if not manager.login(server, user, password):
 File /usr/lib64/heartbeat-gui/haclient.py, line 1943, in login
   ret = mgmt_connect(ip, username, password, port)
   TypeError: mgmt_connect() argument 4 must be string, not None
 -- 
 
 ...and it fails to login/connect.
 
 If in the Server(:port): field of the Login window i specify
 127.0.0.1:xyz it makes the login without the above complaint.  That
 xyz can be literal... i seem to be able to supply anything (including
 a null string) there.
 
 So what's the rub ... why do i/we now have to specify a port?  And if we
 do, what's the appropriate port to specify?

You don't have to supply a port...  You _can_ specify a port if you like...

You can check out the screencast here, and see what I mean...
http://linux-ha.org/Education/Newbie/IPaddrScreencast

In other words It Works For Me (tm)  ;-)



-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] How to set this up correctly...?

2007-03-27 Thread Alan Robertson
Howard Yuan wrote:
 Why does the service die when the crossover breaks? Because I was
 planning on telling the services to look for MySQL on 10.0.0.5 (the
 floating IP) and if the crossover link breaks, the systems don't know
 how to get to 10.0.0.5 as they're looking to find it via the
 crossover link (LAN 2).

OK.  I understand that.  So, mysql needs connectivity via the crossover
cable.

 I need more configuration information to give a detailed answer. 
 Hum...what information would you need to understand it better? I can
 try to draw an ASCII picture of the network diagram if you need me
 to.

No, I think that was enough for now.


 But, the short answer is 'run an R2 configuration with pingd
 properly configured for your problem' I looked at this for awhile
 and I can't figure out what you mean by R2.

I mean a crm yes configuration.

 Also, on my heartbeat configuration right now, i'm using crm no to
 use ipfail, as I found on the mailing list that someone said that
 ipfail doesn't work with crm. Does CRM include a replacement for
 ipfail that works better?

It's called pingd ;-).

I'm not sure that either toolset precisely addresses your problem.

What you likely really want is this:

If one node is up, run all services there.
If both nodes are up, and the two nodes can't talk across
the crossover, then:
run both services on the machine with better
connectivity to your clients

It's the dual-connectivity test that you'd really like to have that
pingd won't really handle.

I believe that pingd treats all ping nodes the same.  But, to truly
solve this problem, you need to treat outside ping nodes differently
from inside ping nodes.

You _can_ solve the problem in R2, you'll just have to write your own
pingd replacement - since it doesn't have to be general, it'll be easy
enough to write, but you'll still have to do it...  You could do it all
in the shell if you like...

Then you only tell heartbeat about one of the sets of ping nodes, and
not the other set, and your tool would manage the other set.

But, none of this is likely to make much sense to you unless you
understand the CRM's way of doing things through the rules in the CIB.

This is explained in some detail in my tutorial on R2.  It's the newest
tutorial in the http://linux-ha.org/HeartbeatTutorials page on the web
site.  The relevant section is slides 137-145.

If you haven't used R2 at all, then maybe reviewing the presentation
from the beginning would be good.  There is a 90 minute video covering
some basic things -- given from these slides - and it's linked to from
that same page.  If you have trouble viewing the video directly, then
try the abstract page - it has an embedded video viewer in Java at the
bottom of the web page.

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Monitoring resources

2007-03-27 Thread M.
Hi

i have Debian Etch, with:

- heartbeat  2.0.7-2 
- drbd0.7

and i need to monitorice postfix, apache2, ldap and courier, i already
have the cluster configurated and it works with out problems, but i need
but i need to active a monitor. I saw in the linux-ha.org about CRM, but
i do not know how to do it.

There is a page with some details about the implementation?

Thanks a lot !!!

Michael.-



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] ip address resources faile to start

2007-03-27 Thread Robert Fowler
Every time I go to start heartbeat I get an error stating ip addresource
is stopped and as a result heartbeat fails to start the Virtual IP
address

 

haresources looks like

testa 172.16.9.148 drbddisk::drbd0
Filesystem::/dev/drbd0::/transactions::ext3

 

ha.cf looks like

crm false

auto_failback off

watchdog /dev/watchdog

nodetesta

nodetestb

ping 172.16.9.161

deadping 30

debugfile /var/log/ha-debug

logfile /var/log/ha-log

logfacility local0

keepalive 2

deadtime 30

warntime 10

initdead 120

udpport 694

bcast   eth0 eth1   # Linux

 

any ideas appreciated

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Masters take long time to get back the ip from slave

2007-03-27 Thread Austin Rock

I have configure ha with latest ver.  When my master gets down slave auto
take ip in 1 sec. which i have mention in my ha.cf file.  But when master
get`s back to work it takes around 40-50 seconds to take over the ip.  When
can i mention that second so that when master gets up it will take in 1 sec.
only.


Thanks.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] haclient.py requires a port to login ?

2007-03-27 Thread Alan Robertson
Paulo F. Andrade wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Actually a similar thing happens on my Mac, the diference is that I do
 have to specify the correct port to connect. Anything else won't do.
 
 On my Linux systems (running ubuntu and gentoo) I don't have to specify
 the port number, and it defaults to 5560.
 
 Here's the output when I don't specify a port number:
 Traceback (most recent call last):
   File /sw/lib/heartbeat-gui/haclient.py, line 1598, in on_login
 if not manager.login(server, user, password):
   File /sw/lib/heartbeat-gui/haclient.py, line 1943, in login
 ret = mgmt_connect(ip, username, password, port)
 TypeError: mgmt_connect() argument 4 must be string, not None
 
 It's no big deal, but it must be a bug.
 Paulo F. Andrade [EMAIL PROTECTED]

Could one of you kindly make a bugzilla for this for me please?

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems