[Pacemaker] location constraint question

2010-09-20 Thread Pavlos Parissis
Hi, I am having problems to understand why my DRBD ms resource wants a location constraint. My setup is quite simple 3 nodes 2 resource groups which hold ip,fs and the dymmy resources 2 resources for 2 drbd 2 master/slave resource for 2 DRBD. The objective is to have pbx_service_01 to use as

Re: [Pacemaker] location constraint question

2010-09-21 Thread Pavlos Parissis
On 21 September 2010 08:38, Andrew Beekhof and...@beekhof.net wrote: BTW, why does crm_mon report only 4 resource? Because the drbd resources were made into master/slaves. See:   ms ms-drbd_01 drbd_01 \        meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

Re: [Pacemaker] location constraint question

2010-09-21 Thread Pavlos Parissis
On 21 September 2010 09:04, Andrew Beekhof and...@beekhof.net wrote: On Tue, Sep 21, 2010 at 8:58 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 21 September 2010 08:38, Andrew Beekhof and...@beekhof.net wrote: BTW, why does crm_mon report only 4 resource? Because the drbd resources

[Pacemaker] migration-threshold and failure-timeout

2010-09-21 Thread Pavlos Parissis
Hi, I am trying to figure a way to do the following if the monitor of x resource fails N time in a period of Z then fail over to the other node and clear fail-count. Regards, Pavlos ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org

Re: [Pacemaker] migration-threshold and failure-timeout

2010-09-22 Thread Pavlos Parissis
On 21 September 2010 15:28, Vadym Chepkov vchep...@gmail.com wrote: On Tue, Sep 21, 2010 at 9:14 AM, Dan Frincu dfri...@streamwide.ro wrote: Hi, This = http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-failure-migration.html explains it pretty well. Notice

[Pacemaker] target-role default value

2010-09-24 Thread Pavlos Parissis
Hi, What is the default value for target-role in resource? I tried to query it with crm_resource but without success. crm_resource pbx_02 --get-property target-role crm_resource pbx_02 --get-parameter target-role --meta Cheers, Pavlos ___ Pacemaker

Re: [Pacemaker] target-role default value

2010-09-24 Thread Pavlos Parissis
On 24 September 2010 11:40, Michael Schhwartzkopff mi...@clusterbau.comwrote: On Friday 24 September 2010 11:34:11 Pavlos Parissis wrote: Hi, What is the default value for target-role in resource? I tried to query it with crm_resource but without success. crm_resource pbx_02 --get

Re: [Pacemaker] default timeout for op start/stop

2010-09-24 Thread Pavlos Parissis
On 24 September 2010 13:54, Michael Schhwartzkopff mi...@clusterbau.comwrote: On Friday 24 September 2010 13:50:49 Pavlos Parissis wrote: Hi, When I verify my conf I get complains about the timeout on start and stop operation crm(live)configure# verify WARNING: drbd_01: default

Re: [Pacemaker] default timeout for op start/stop

2010-09-27 Thread Pavlos Parissis
On 24 September 2010 18:12, Dejan Muhamedagic deja...@fastmail.fm wrote: [...snip...] Default timeout is coded into the resource agent. You safely can ignore the WARNINGs. These are also removed from more recent versions of pacemaker. These warnings shouldn't be ignored. The defaults

Re: [Pacemaker] default timeout for op start/stop

2010-09-27 Thread Pavlos Parissis
On 27 September 2010 12:17, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Mon, Sep 27, 2010 at 12:00:19PM +0200, Pavlos Parissis wrote: On 24 September 2010 18:12, Dejan Muhamedagic deja...@fastmail.fm wrote: [...snip...] Default timeout is coded into the resource agent

[Pacemaker] crm resource move doesn't move the resource

2010-09-28 Thread Pavlos Parissis
Hi, When I issue crm resource move pbx_service_01 node-0N it moves this resource group but the fs_01 resource is not started because drbd_01 is still running on other node and it is not moved as well tonode-0N, even I have colocation constraints. I am pretty sure that I have that working before,

[Pacemaker] promote a ms resource to a node

2010-09-28 Thread Pavlos Parissis
Hi, Let's say that I have manually demote a ms resource and have the following situation crm(live)resource# demote ms-drbd_01 crm(live)resource# status [..snip..] Master/Slave Set: ms-drbd_01 Slaves: [ node-01 node-03 ] How can I manually promote ms-drbd_01 on node-03? The promote command

Re: [Pacemaker] crm resource move doesn't move the resource

2010-09-29 Thread Pavlos Parissis
On 28 September 2010 15:09, Pavlos Parissis pavlos.paris...@gmail.comwrote: Hi, When I issue crm resource move pbx_service_01 node-0N it moves this resource group but the fs_01 resource is not started because drbd_01 is still running on other node and it is not moved as well tonode-0N, even

Re: [Pacemaker] Does bond0 network interface work with corosync/pacemaker

2010-09-29 Thread Pavlos Parissis
Please paste the conf of corosync, without suppling the conf is quite difficult to help you Cheers, Pavlos ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home:

Re: [Pacemaker] Does bond0 network interface work with corosync/pacemaker

2010-09-29 Thread Pavlos Parissis
On 29 September 2010 21:01, Andreas Hofmeister a...@collax.com wrote: On 29.09.2010 19:59, Mike A Meyer wrote: We have two nodes that we have the IP address assigned to a bond0 network interface instead of the usual eth0 network interface. We are wondering if there are issues with trying

Re: [Pacemaker] Does bond0 network interface work with corosync/pacemaker

2010-09-30 Thread Pavlos Parissis
On 30 September 2010 15:23, Mike A Meyer mme...@cds-global.com wrote: Pavlos, Thanks for helping out on this. We are running on RHEL 5.5 running on the iron and not a VM. We don't have SELinux turned on and the firewall is disabled. Here is information in the /etc/modprobe.conf file.

Re: [Pacemaker] resources are restarted without obvious reasons

2010-10-01 Thread Pavlos Parissis
Hi Could be related to a possible bug mentioned here[1]? BTW here is the conf of pacemaker node $id=b8ad13a6-8a6e-4304-a4a1-8f69fa735100 node-02 node $id=d5557037-cf8f-49b7-95f5-c264927a0c76 node-01 node $id=e5195d6b-ed14-4bb3-92d3-9105543f9251 node-03 primitive drbd_01 ocf:linbit:drbd \

Re: [Pacemaker] resources are restarted without obvious reasons

2010-10-01 Thread Pavlos Parissis
= TRUE; } return delete_resource; } On 1 October 2010 09:13, Pavlos Parissis pavlos.paris...@gmail.com wrote: Hi Could be related to a possible bug mentioned here[1]? BTW here is the conf of pacemaker node $id=b8ad13a6-8a6e-4304-a4a1-8f69fa735100 node-02 node $id=d5557037-cf8f-49b7

Re: [Pacemaker] crm resource move doesn't move the resource

2010-10-02 Thread Pavlos Parissis
I am wondering if resource-stickiness=1000 could be reason for the behavior I see, but again when on the other cluster i recreated the ms-drbd the issue was solved. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org

Re: [Pacemaker] promote a ms resource to a node

2010-10-03 Thread Pavlos Parissis
it. Not sure about the shell syntax though. On Tue, Sep 28, 2010 at 3:51 PM, Pavlos Parissis pavlos.paris...@gmail.com wrote: Hi, Let's say that I have manually demote a ms resource and have the following situation crm(live)resource# demote ms-drbd_01 crm(live)resource# status [..snip

[Pacemaker] Recommend Fencing device

2010-10-04 Thread Pavlos Parissis
Hi Which fencing devices will you recommend? I want to use a device which will give as less problems as possible on configuring a fencing resource for 3 node cluster. Regards, Pavlos ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org

Re: [Pacemaker] resources are restarted without obvious reasons

2010-10-05 Thread Pavlos Parissis
On 5 October 2010 11:15, Andrew Beekhof and...@beekhof.net wrote: On Fri, Oct 1, 2010 at 9:53 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote: Hi, It seams that it happens every time PE wants to check the conf 09:23:55 crmd: [3473]: info: crm_timer_popped: PEngine Recheck Timer

[Pacemaker] init Script fails in 1 of LSB Compatible test

2010-10-05 Thread Pavlos Parissis
Hi, I am thinking to put under cluster control the sshd and I am checking if the /etc/init.d/sshd supplied by RedHat 5.4 is compatible with LSB. So, I run the test mentioned here [1] and it fails at test 6, it returns 1 and failed message. Could this create problems within pacemaker? Regards,

Re: [Pacemaker] init Script fails in 1 of LSB Compatible test

2010-10-05 Thread Pavlos Parissis
On 5 October 2010 13:19, Andrew Beekhof and...@beekhof.net wrote: On Tue, Oct 5, 2010 at 12:51 PM, Pavlos Parissis pavlos.paris...@gmail.com wrote: Hi, I am thinking to put under cluster control the sshd and I am checking if the /etc/init.d/sshd supplied by RedHat 5.4 is compatible

Re: [Pacemaker] Online and Offline status when doing crm_mon

2010-10-06 Thread Pavlos Parissis
On 5 October 2010 22:12, Mike A Meyer mme...@cds-global.com wrote: We are setup in a two node active/passive cluster using pacemaker/corosync. We shutdown the pacemaker/corosync on both nodes and changed the uname -n on our nodes to show the short name instead of the FQDN. Started up

Re: [Pacemaker] pacemaker version

2010-10-07 Thread Pavlos Parissis
On 7 October 2010 08:33, Andrew Beekhof and...@beekhof.net wrote: On Wed, Oct 6, 2010 at 5:04 PM, Gianluca Cecchi gianluca.cec...@gmail.com wrote: On Wed, Oct 6, 2010 at 4:25 PM, Shravan Mishra shravan.mis...@gmail.com wrote: That is what I heard too, that's the reason for this question.

Re: [Pacemaker] crm resource move doesn't move the resource

2010-10-07 Thread Pavlos Parissis
On 8 October 2010 04:26, jiaju liu liujiaj...@yahoo.com.cn wrote: Message: 2 Date: Thu, 7 Oct 2010 21:58:29 +0200 From: Pavlos Parissis pavlos.paris...@gmail.comhttp://cn.mc157.mail.yahoo.com/mc/compose?to=pavlos.paris...@gmail.com To: The Pacemaker cluster resource manager

Re: [Pacemaker] crm resource move doesn't move the resource

2010-10-08 Thread Pavlos Parissis
On 8 October 2010 08:29, Andrew Beekhof and...@beekhof.net wrote: On Thu, Oct 7, 2010 at 9:58 PM, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 7 October 2010 09:01, Andrew Beekhof and...@beekhof.net wrote: On Sat, Oct 2, 2010 at 6:31 PM, Pavlos Parissis pavlos.paris...@gmail.com

Re: [Pacemaker] pacemaker version

2010-10-08 Thread Pavlos Parissis
On 8 October 2010 09:28, Andrew Beekhof and...@beekhof.net wrote: On Fri, Oct 8, 2010 at 8:31 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 8 October 2010 07:47, Andrew Beekhof and...@beekhof.net wrote: On Thu, Oct 7, 2010 at 10:10 PM, Pavlos Parissis pavlos.paris...@gmail.com wrote

Re: [Pacemaker] crm resource move doesn't move the resource

2010-10-08 Thread Pavlos Parissis
On 8 October 2010 09:29, Andrew Beekhof and...@beekhof.net wrote: On Fri, Oct 8, 2010 at 8:34 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 8 October 2010 08:29, Andrew Beekhof and...@beekhof.net wrote: On Thu, Oct 7, 2010 at 9:58 PM, Pavlos Parissis pavlos.paris...@gmail.com wrote

[Pacemaker] unpack_rsc_op: Hard error

2010-10-09 Thread Pavlos Parissis
Hi, Does anyone know why PE wants to unpack resources on nodes that will never run due to location constraints? I am getting this messages and I am wondering if they harmless or not. 23:12:38 pengine: [7705]: notice: unpack_rsc_op: Hard error - sshd-pbx_01_monitor_0 failed with rc=5: Preventing

Re: [Pacemaker] crm resource move doesn't move the resource

2010-10-09 Thread Pavlos Parissis
On 8 October 2010 22:05, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 8 October 2010 09:29, Andrew Beekhof and...@beekhof.net wrote: On Fri, Oct 8, 2010 at 8:34 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 8 October 2010 08:29, Andrew Beekhof and...@beekhof.net wrote: On Thu

[Pacemaker] crmd thinks lsb returns error on monito

2010-10-09 Thread Pavlos Parissis
Hi, My resource is not started because I get this 00:44:27 crmd: [3141]: WARN: status_from_rc: Action 16 (pbx_02_monitor_0) on node-02 failed (target: 7 vs. rc: 5): Error but when I run manually the status I get 3, which ok because the application is stopped [r...@node-02 ~]#

Re: [Pacemaker] unpack_rsc_op: Hard error

2010-10-10 Thread Pavlos Parissis
On 9 October 2010 23:20, Pavlos Parissis pavlos.paris...@gmail.com wrote: Hi, Does anyone know why PE wants to unpack resources on nodes that will never run due to location constraints? I am getting this messages and I am wondering if they harmless or not. 23:12:38 pengine: [7705]: notice

Re: [Pacemaker] crm resource move doesn't move the resource

2010-10-11 Thread Pavlos Parissis
On 8 October 2010 09:29, Andrew Beekhof and...@beekhof.net wrote: On Fri, Oct 8, 2010 at 8:34 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 8 October 2010 08:29, Andrew Beekhof and...@beekhof.net wrote: On Thu, Oct 7, 2010 at 9:58 PM, Pavlos Parissis pavlos.paris...@gmail.com wrote

Re: [Pacemaker] crmd thinks lsb returns error on monito

2010-10-11 Thread Pavlos Parissis
On 10 October 2010 17:40, Andrew Beekhof and...@beekhof.net wrote: On Sun, Oct 10, 2010 at 12:47 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote: Hi, My resource is not started because I get this 00:44:27 crmd: [3141]: WARN: status_from_rc: Action 16 (pbx_02_monitor_0) on node-02

Re: [Pacemaker] unpack_rsc_op: Hard error

2010-10-11 Thread Pavlos Parissis
On 10 October 2010 17:39, Andrew Beekhof and...@beekhof.net wrote: On Sat, Oct 9, 2010 at 11:20 PM, Pavlos Parissis pavlos.paris...@gmail.com wrote: Hi, Does anyone know why PE wants to unpack resources on nodes that will never run due to location constraints? Because part of its job

Re: [Pacemaker] resource is stuck

2010-10-11 Thread Pavlos Parissis
On 11 October 2010 11:12, Pavlos Parissis pavlos.paris...@gmail.com wrote: Hi, Cluster got an error on monitor and stop action on a resource and since then I can't do stop/start/manage/unmanage that resource. For some strange reason the actions monitor/stop failed, manually worked, but i

[Pacemaker] 1st monitor is too fast after the start

2010-10-12 Thread Pavlos Parissis
Hi, I noticed a race condition while I was integration an application with Pacemaker and thought to share with you. The init script of the application is LSB-compliant and passes the tests mentioned at the Pacemaker documentation. Moreover, the init script uses the supplied functions from the

[Pacemaker] sshd under cluster

2010-10-12 Thread Pavlos Parissis
/subsys/sshd-pbx_02 ] ; then +success +RETVAL=0 else failure $Stopping $prog fi RETVAL=$? + +### Added by Pavlos Parissis ### +# Disable the below bit because killall kills the script itself. +# This causes

Re: [Pacemaker] Migrate resources based on connectivity

2010-10-12 Thread Pavlos Parissis
On 12 October 2010 20:00, Dan Frincu dfri...@streamwide.ro wrote: Hi, Lars Ellenberg wrote: On Mon, Oct 11, 2010 at 03:50:01PM +0300, Dan Frincu wrote: Hi, Dejan Muhamedagic wrote: Hi, On Sun, Oct 10, 2010 at 10:27:13PM +0300, Dan Frincu wrote: Hi, I have the following setup:

Re: [Pacemaker] 1st monitor is too fast after the start

2010-10-13 Thread Pavlos Parissis
On 13 October 2010 09:48, Dan Frincu dfri...@streamwide.ro wrote: Hi, I've noticed the same type of behavior, however in a different context, my setup includes 3 drbd devices and a group of resources, all have to run on the same node and move together to other nodes. My issue was with the

Re: [Pacemaker] 1st monitor is too fast after the start

2010-10-13 Thread Pavlos Parissis
On 13 October 2010 10:50, Dan Frincu dfri...@streamwide.ro wrote: From what I see you have a dual primary setup with failover on the third node, basically if you have one drbd resource for which you have both ordering and collocation, I don't think you need to improve it, if it ain't broke,

Re: [Pacemaker] crm resource move doesn't move the resource

2010-10-13 Thread Pavlos Parissis
On 11 October 2010 11:16, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 8 October 2010 09:29, Andrew Beekhof and...@beekhof.net wrote: On Fri, Oct 8, 2010 at 8:34 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 8 October 2010 08:29, Andrew Beekhof and...@beekhof.net wrote

Re: [Pacemaker] Active-Active HA Firewall

2010-10-15 Thread Pavlos Parissis
On 15 October 2010 09:47, Marcel Hauser marcel_hau...@gmx.ch wrote: But that is no problem. firewalling is no hard job any more. A reasonable machine can firewall 1 GBit/s traffic. valid point. my only concern is/was that i don't like the idea of a passive firewall because when you

Re: [Pacemaker] Help understanding why a failover occurred.

2010-10-16 Thread Pavlos Parissis
On 16 October 2010 00:45, Jai away...@gmail.com wrote: I have setup a DRBD-Xen failover cluster. Last night at around 02:50 it failed the resources from server bravo to alpha. I'm trying to find out what caused the failover of resources. I don't see anything in the logs that indicate the

[Pacemaker] using xml for rules

2010-10-17 Thread Pavlos Parissis
Hi, I am trying to make a rule to control the failback on the resources. I want during working days from 06:00 to 23:00 and on weekend from 08:00 to 16:00 to have resource-stickiness 1000 and on the left hours zero, so cluster can perform failback any resource which failed over during the working

Re: [Pacemaker] Help understanding why a failover occurred.

2010-10-18 Thread Pavlos Parissis
On 18 October 2010 04:03, Tim Serong tser...@novell.com wrote: On 10/16/2010 at 09:45 AM, Jai away...@gmail.com wrote: I have setup a DRBD-Xen failover cluster. Last night at around 02:50 it failed the resources from server bravo to alpha. I'm trying to find out what caused the failover

Re: [Pacemaker] Help understanding why a failover occurred.

2010-10-18 Thread Pavlos Parissis
On 18 October 2010 05:17, Jai away...@gmail.com wrote: I don't seen anything as well, but I am not surprised by that. I have seen a similar issue on my cluster where logs weren't that helpful. Does it still occur on your cluster? No, I haven't seen it again. But it could be that I

Re: [Pacemaker] Question: How many nodes can join a cluster?

2010-10-18 Thread Pavlos Parissis
On 18 October 2010 10:52, Florian Haas florian.h...@linbit.com wrote: - Original Message - From: Andreas Vogelsang a.vogels...@uni-muenster.de To: pacemaker@oss.clusterlabs.org Sent: Monday, October 18, 2010 9:46:12 AM Subject: [Pacemaker] Question: How many nodes can join a

Re: [Pacemaker] Question: How many nodes can join a cluster?

2010-10-18 Thread Pavlos Parissis
On 18 October 2010 11:13, Dan Frincu dfri...@streamwide.ro wrote: Pavlos Parissis wrote: On 18 October 2010 10:52, Florian Haas florian.h...@linbit.com wrote: - Original Message - From: Andreas Vogelsang a.vogels...@uni-muenster.de To: pacemaker@oss.clusterlabs.org Sent

Re: [Pacemaker] unpack_rsc_op: Hard error

2010-10-19 Thread Pavlos Parissis
On 19 October 2010 14:16, Andrew Beekhof and...@beekhof.net wrote: On Mon, Oct 11, 2010 at 11:25 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 10 October 2010 17:39, Andrew Beekhof and...@beekhof.net wrote: On Sat, Oct 9, 2010 at 11:20 PM, Pavlos Parissis pavlos.paris

Re: [Pacemaker] Failover domains?

2010-10-26 Thread Pavlos Parissis
On 25 October 2010 19:50, David Quenzler quenz...@gmail.com wrote: Is there a way to limit failover behavior to a subset of cluster nodes or pin a resource to a node? Yes, there is a way. Make sure you have a asymmetric cluster by setting symmetric-cluster to false and then configure

[Pacemaker] AP9606 fencing device

2010-10-27 Thread Pavlos Parissis
Hi, I have a APC AP9606 PDU and I am trying to find a stonith agent which works with that PDU. The apcmaster and apcmastersnmp don't work as you see below. I managed to get the rackpdu working by setting the outlet config (the oid for snmpwalk fails) and setting also the command OID. Here is a

Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Pavlos Parissis
On 27 October 2010 14:09, Vadym Chepkov vchep...@gmail.com wrote: [...snip...] Hold on a sec, are you using clone on AP7901? Does it support multiple connections? Mine it doesn't. Then it's useless regardless clone or not, you have to have multiple instances, because server can't reliable

Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Pavlos Parissis
Hi, I quickly tested cloning on this fencing and it worked. I used iptables to break the heartbeat link on node-01 and it was fenced by the other node - the DC. In the coming days I will test without cloning fencing device. Cheers, Pavlos ___ Pacemaker

Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Pavlos Parissis
On 27 October 2010 14:09, Vadym Chepkov vchep...@gmail.com wrote: On Oct 27, 2010, at 7:58 AM, Pavlos Parissis wrote: On 27 October 2010 13:43, Vadym Chepkov vchep...@gmail.com wrote: On Oct 27, 2010, at 7:27 AM, Pavlos Parissis wrote: On 27 October 2010 13:12, Vadym Chepkov vchep

Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Pavlos Parissis
On 27 October 2010 19:23, Vadym Chepkov vchep...@gmail.com wrote: On Oct 27, 2010, at 1:18 PM, Pavlos Parissis wrote: ok, i have done the same hack but i will remove it. I think 1.1.4 will be out before we go on production and hopefully this will be fixed in 1.1.4. This is part

Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Pavlos Parissis
I did more testing using the clone type of fencing and worked as I expected. test1 hack init script to return 1 on stop and run a crm resource move on that resource result node it was fenced and resource was started on the other node test2 using firewall to break the heartbeat links on node with

Re: [Pacemaker] AP9606 fencing device

2010-10-27 Thread Pavlos Parissis
On 27 October 2010 19:46, Pavlos Parissis pavlos.paris...@gmail.com wrote: I did more testing using the clone type of fencing and worked as I expected. test1 hack init script to return 1 on stop and run a crm resource move on that resource result node it was fenced and resource was started

Re: [Pacemaker] Multiple independent two-node clusters side-by-side?

2010-10-27 Thread Pavlos Parissis
this http://www.gossamer-threads.com/lists/linuxha/users/67482?search_string=Redundant%20Rings%20%26quot;Still%20Not%20There%3F;#67482 post has a lot information for you on this subject. Cheers, Pavlos ___ Pacemaker mailing list:

[Pacemaker] Pacemaker-1.1.4, when?

2010-10-28 Thread Pavlos Parissis
Hi, When do we expect to have Pacemaker-1.1.4 available? Cheers, Pavlos ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started:

Re: [Pacemaker] Impossible to add a 4th node to a cluster

2010-10-28 Thread Pavlos Parissis
On 28 October 2010 18:30, Guillaume Chanaud guillaume.chan...@connecting-nature.com wrote: [...snip...] corosync and auth files are the same on server2? Yes of course :D (copied by scp), as i told server1 can join when server2 is offline, and server 2 can join when server1 is offline, but if

Re: [Pacemaker] Pacemaker-1.1.4, when?

2010-10-29 Thread Pavlos Parissis
On 28 October 2010 22:55, Andrew Beekhof and...@beekhof.net wrote: Its released already, but the wrong packages got built because I ran the wrong command :-( Fedora 13 packages are uploading now, I'll do opensuse 11.3 in the morning I have seen the tag on Mercurial but I haven't seen any rpm

[Pacemaker] PE ignores monitor failure of stonith:external/rackpdu

2010-10-29 Thread Pavlos Parissis
Hi, I wanted to check what happens when the monitor of a fencing agents fails, thus I disconnected the PDU from network, reduced the monitor interval and put debug statements on the fencing script. here is the debug statements on the status code status) if [ -z $pduip ]; then

Re: [Pacemaker] Pacemaker-1.1.4, when?

2010-10-29 Thread Pavlos Parissis
On 29 October 2010 10:25, Andrew Beekhof and...@beekhof.net wrote: On Fri, Oct 29, 2010 at 8:15 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 28 October 2010 22:55, Andrew Beekhof and...@beekhof.net wrote: Its released already, but the wrong packages got built because I ran

Re: [Pacemaker] Pacemaker-1.1.4, when?

2010-10-29 Thread Pavlos Parissis
On 29 October 2010 11:47, Andrew Beekhof and...@beekhof.net wrote: [...snip..] There wont be unfortunately. Some of the changes we needed to make involved the use of g_hash_table_get_values() which only appeared in glib 2.14 So EPEL5 is stuck on the 1.0 series. Does that mean I shouldn't use

Re: [Pacemaker] Pacemaker-1.1.4, when?

2010-10-29 Thread Pavlos Parissis
On 29 October 2010 12:23, Andrew Beekhof and...@beekhof.net wrote: On Fri, Oct 29, 2010 at 11:58 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 29 October 2010 11:47, Andrew Beekhof and...@beekhof.net wrote: [...snip..] There wont be unfortunately. Some of the changes we needed

[Pacemaker] IP Power 9258HP with external/ippower9258

2010-10-30 Thread Pavlos Parissis
Hi, Does anyone know if the fencing agent ippower9258 works with IP Power 9258HP PDU? The readme file of the fencing agent mentions the following Especially IP Power 9258 HP uses a different http command interface Doesn't that mean that it wont with 9258 HP? The fact that Aviosys has different

Re: [Pacemaker] IP Power 9258HP with external/ippower9258

2010-10-30 Thread Pavlos Parissis
On 30 October 2010 16:03, Pavlos Parissis pavlos.paris...@gmail.com wrote: Hi, Does anyone know if the fencing agent ippower9258 works with IP Power 9258HP PDU? The readme file of the fencing agent mentions the following  Especially IP Power 9258 HP uses a different http command interface

Re: [Pacemaker] Ordering clones and primitives

2010-10-31 Thread Pavlos Parissis
On 30 October 2010 19:55, Lars Kellogg-Stedman l...@oddbit.com wrote: I have a two node cluster that hosts two virtual ips on the same network: primitive proxy_0_ip ocf:heartbeat:IPaddr \ params ip=10.10.10.20 cidr_netmask=255.255.255.0 nic=eth3 primitive proxy_1_ip ocf:heartbeat:IPaddr

[Pacemaker] downgrading to pacemaker-1.0.9.1-1.15.el5

2010-11-01 Thread Pavlos Parissis
Hi, I have been using 1.1.3 on CentOS and I decided to downgrade to 1.0.9.1-1.15.el5. The procedure was the following stop heartbeat on all cluster members downgrade to 1.0.9 doing the following on all cluster memebrs yum downgrade pacemaker-1.0.9.1-1.15.el5 pacemaker-libs-1.0.9.1-1.15.el5

Re: [Pacemaker] downgrading to pacemaker-1.0.9.1-1.15.el5

2010-11-01 Thread Pavlos Parissis
On 1 November 2010 09:19, Pavlos Parissis pavlos.paris...@gmail.com wrote: Hi, I have been using 1.1.3 on CentOS and I decided to downgrade to 1.0.9.1-1.15.el5. The procedure was the following stop heartbeat on all cluster members downgrade to 1.0.9 doing the following on all cluster

Re: [Pacemaker] Stonith Device APC AP7900

2010-11-02 Thread Pavlos Parissis
On 1 November 2010 15:01, Rick Cone rc...@securepaymentsystems.com wrote: Dejan, Below I had: primitive res_stonith stonith:apcmastersnmp \ params ipaddr=192.1.1.109 port=161 community=sps \ op start interval=0 timeout=60s \ op monitor interval=60s timeout=60s \

Re: [Pacemaker] Stonith Device APC AP7900

2010-11-02 Thread Pavlos Parissis
On 2 November 2010 11:04, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Tue, Nov 02, 2010 at 08:08:32AM +0100, Pavlos Parissis wrote: On 1 November 2010 15:01, Rick Cone rc...@securepaymentsystems.com wrote: Dejan, Below I had: primitive res_stonith

Re: [Pacemaker] PE ignores monitor failure of stonith:external/rackpdu

2010-11-02 Thread Pavlos Parissis
On 2 November 2010 11:22, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Fri, Oct 29, 2010 at 08:37:04AM +0200, Pavlos Parissis wrote: Hi, I wanted to check what happens when the monitor of a fencing agents fails, thus I disconnected the PDU from network, reduced the monitor

Re: [Pacemaker] Stonith Device APC AP7900

2010-11-02 Thread Pavlos Parissis
On 2 November 2010 12:58, Dejan Muhamedagic deja...@fastmail.fm wrote: [...snip...] Do you know under which conditions pacemaker initiates multiple connections to a fencing device? There are no specific conditions. It can happen by chance because individual clone instances run

Re: [Pacemaker] PE ignores monitor failure of stonith:external/rackpdu

2010-11-02 Thread Pavlos Parissis
On 2 November 2010 13:02, Dejan Muhamedagic deja...@fastmail.fm wrote: [...snip...] Definitely not. If you do the monitor action from the command line does that also return the unexpected exit code: from the code I pasted you can see it returned 1. There is a difference.

Re: [Pacemaker] PE ignores monitor failure of stonith:external/rackpdu

2010-11-02 Thread Pavlos Parissis
On 2 November 2010 13:18, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Tue, Nov 02, 2010 at 01:09:02PM +0100, Pavlos Parissis wrote: On 2 November 2010 13:02, Dejan Muhamedagic deja...@fastmail.fm wrote: [...snip...] Definitely not. If you do the monitor action from

[Pacemaker] drbd on heartbeat links

2010-11-02 Thread Pavlos Parissis
Hi, I am trying to figure out how I can resolve the following scenario Facts 3 nodes 2 DRBD ms resource 2 group resource by default drbd1/group1 runs on node-01 and drbd2/group2 runs on node2 drbd1/group1 can only run on node-01 and node-03 drbd2/group2 can only run on node-02 and node-03 DRBD 

Re: [Pacemaker] drbd on heartbeat links

2010-11-02 Thread Pavlos Parissis
On 2 November 2010 16:15, Dan Frincu dfri...@streamwide.ro wrote: Hi, Pavlos Parissis wrote: Hi, I am trying to figure out how I can resolve the following scenario Facts 3 nodes 2 DRBD ms resource 2 group resource by default drbd1/group1 runs on node-01 and drbd2/group2 runs on node2

Re: [Pacemaker] drbd on heartbeat links

2010-11-02 Thread Pavlos Parissis
On 2 November 2010 22:07, Pavlos Parissis pavlos.paris...@gmail.com wrote: On 2 November 2010 16:15, Dan Frincu dfri...@streamwide.ro wrote: Hi, Pavlos Parissis wrote: Hi, I am trying to figure out how I can resolve the following scenario Facts 3 nodes 2 DRBD ms resource 2 group

Re: [Pacemaker] [DRBD-user] drbd on heartbeat links

2010-11-03 Thread Pavlos Parissis
On 2 November 2010 22:57, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Tue, Nov 02, 2010 at 10:07:17PM +0100, Pavlos Parissis wrote: On 2 November 2010 16:15, Dan Frincu dfri...@streamwide.ro wrote: Hi, Pavlos Parissis wrote: Hi, I am trying to figure out how I can resolve

Re: [Pacemaker] Shutting down resource to allow a more important one to run

2010-11-03 Thread Pavlos Parissis
On 3 November 2010 14:26, Ciro Iriarte cyru...@gmail.com wrote: Hi, I'm planning to run a PDC/BDC Samba pair with HA file service (active/passive with DRBD). Would it be possible to have the second node running SAMBA as a BDC and once the primary node fails, stop the BDC resource and start the

Re: [Pacemaker] Manually controlled cluster

2010-11-04 Thread Pavlos Parissis
On 4 November 2010 11:30, Michael Schwartzkopff mi...@clusterbau.com wrote: On Thursday 04 November 2010 11:23:18 Michael Schwartzkopff wrote: Hi, I want to create a cluster with DRBD, Filesystem, a service and an IP address. Failover should only be triggered by a sys admin and not happen

Re: [Pacemaker] Manually controlled cluster

2010-11-04 Thread Pavlos Parissis
On 4 November 2010 12:01, Michael Schwartzkopff mi...@clusterbau.com wrote: On Thursday 04 November 2010 11:56:54 Pavlos Parissis wrote: On 4 November 2010 11:30, Michael Schwartzkopff mi...@clusterbau.com wrote: On Thursday 04 November 2010 11:23:18 Michael Schwartzkopff wrote: Hi, I want

Re: [Pacemaker] Manually controlled cluster

2010-11-04 Thread Pavlos Parissis
On 4 November 2010 14:18, Vladislav Bogdanov bub...@hoster-ok.com wrote: 04.11.2010 13:36, Pavlos Parissis wrote: ... why do you want that? Customer request. Definitely NOT my idea. something like this could be useful location master-location ms-drbd_02 rule $id=master-rule

Re: [Pacemaker] colocation that doesn't

2010-11-05 Thread Pavlos Parissis
On 5 November 2010 04:07, Vadym Chepkov vchep...@gmail.com wrote: On Nov 4, 2010, at 12:53 PM, Alan Jones wrote: If I understand you correctly, the role of the second resource in the colocation command was defaulting to that of the first Master which is not defined or is untested for

Re: [Pacemaker] Pacemaker-1.1.4, when?

2010-11-09 Thread Pavlos Parissis
On 9 November 2010 09:47, Andrew Beekhof and...@beekhof.net wrote: [...snip...] Since there is no realistic upgrade path to 1.1.4 on EPEL, I am wondering if there any benefit of staying on 1.1.3 compared to using 1.0.10. Its already out :-) Plus the ordering code is much improved. I've

[Pacemaker] OCF_RESKEY_device to the device to be managed

2010-11-09 Thread Pavlos Parissis
Hi, Has anyone see the below error on a Filesytem resource? 11:19:33 crmd: [3296]: info: do_lrm_rsc_op: Performing key=13:19:0:9d7002dc-2865-4610-9240-ff844f62205d op=fs_01_stop_0 ) 11:19:33 lrmd: [3293]: info: rsc:fs_01:74: stop 11:19:33 Filesystem[31487]: [31493]: ERROR: Please set

Re: [Pacemaker] OCF_RESKEY_device to the device to be managed

2010-11-09 Thread Pavlos Parissis
On 9 November 2010 11:25, Pavlos Parissis pavlos.paris...@gmail.com wrote: Hi, Has anyone see the below error on a Filesytem resource? 11:19:33 crmd: [3296]: info: do_lrm_rsc_op: Performing key=13:19:0:9d7002dc-2865-4610-9240-ff844f62205d op=fs_01_stop_0 ) 11:19:33 lrmd: [3293]: info

Re: [Pacemaker] making resource managed

2010-11-09 Thread Pavlos Parissis
On 9 November 2010 14:14, Vadim S. Khondar v.khon...@o3.ua wrote: If after this I edit CIB and apply it, all LRM messages disappear and resource starts managed as it should. what do you mean edit CIB? BTW, I have seen that behavior as well on 1.0.9 Cheers, Pavlos

Re: [Pacemaker] cluster comunication problem??

2010-11-10 Thread Pavlos Parissis
On 10 November 2010 10:49, jiaju liu liujiaj...@yahoo.com.cn wrote: the syslog as follows is these means there are some comunication problem in cluster  oss1 crmd: [11282]: info: abort_transition_graph: do_te_invoke:191 - Triggered transition abort (complete=0) : Peer Cancelled oss1

Re: [Pacemaker] 1.0.10, released?

2010-11-10 Thread Pavlos Parissis
On 10 November 2010 11:14, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: hi, On 11/10/2010 11:02 AM, Pavlos Parissis wrote: Hi, Although it has been mentioned in other threads that 1.0.10 is out I don't see any RPMs in http://clusterlabs.org/rpm/epel-5/ where did you hear that from

[Pacemaker] 1.0.10, released?

2010-11-10 Thread Pavlos Parissis
Hi, Although it has been mentioned in other threads that 1.0.10 is out I don't see any RPMs in http://clusterlabs.org/rpm/epel-5/ I don't see any tag for 1.0.10 in mercucial (http://hg.clusterlabs.org/pacemaker/1.0/) , but I don't see a tag for 1.0.9 either. Is it actually released or I have

Re: [Pacemaker] 1.0.10, released?

2010-11-10 Thread Pavlos Parissis
On 10 November 2010 11:16, Andrew Beekhof and...@beekhof.net wrote: On Wed, Nov 10, 2010 at 11:02 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote: Hi, Although it has been mentioned in other threads that 1.0.10 is out It is not. 1.1.3 is out which is a reason o use it instead of 1.0.10

Re: [Pacemaker] Infinite fail-count and migration-threshold after node fail-back

2010-11-11 Thread Pavlos Parissis
On 11 November 2010 13:04, Dan Frincu dfri...@streamwide.ro wrote: Hi, Andrew Beekhof wrote: On Mon, Oct 11, 2010 at 9:40 AM, Dan Frincu dfri...@streamwide.ro wrote: Hi all, I've managed to make this setup work, basically the issue with a symmetric-cluster=false and specifying the

Re: [Pacemaker] start filesystem like this is right?

2010-11-12 Thread Pavlos Parissis
On 12 November 2010 07:37, jiaju liu liujiaj...@yahoo.com.cn wrote: start resource steps step(1) crm configure primitive vol_mpath0 ocf:heartbeat:Filesystem meta target-role=stopped params device=/dev/mapper/mpath0 directory=/mnt/mapper/mpath0 fstype='lustre' op start timeout=300s  op stop

[Pacemaker] understanding scores

2010-11-12 Thread Pavlos Parissis
Hi, I am trying to understand how the scores are calculated based on the output of ptest -sL and I have few questions Below is my scores with a line number column and the bottom you will find my configuration So, let's start 1 group_color: pbx_service_01 allocation score on node-01: 200 2

Re: [Pacemaker] (no subject)

2010-11-14 Thread Pavlos Parissis
On 13 November 2010 23:24, Bob Schatz bsch...@yahoo.com wrote: Lunch this week? Yes, why not. where and at what time? Shall we go to Pacemaker cafeteria as the other time, they are always available for us :-) Cheers, Pavlos ___ Pacemaker mailing

  1   2   >