[Linux-cluster] IP Resource behavior with Red Hat Cluster

2010-12-23 Thread Parvez Shaikh
Hi all,

I am using Red Hat cluster 6.2.0 (version shown with cman_tool
version) on Red Hat 5.5

I am on host that has multiple network interfaces and all(or some) of
which may be active while I tried to bring up my IP resource up.

My cluster is of simple configuration -
It has only 2 nodes, and service basically consist of only IP
resource, I had to chose random private IP address for test/debugging
purpose (192.168)

When I tried to start service it failed with message -

clurgmgrd: [31853]: debug 192.168.25.135 is not configured

I manually made this virtual IP available on host and then started
service it worked -

clurgmgrd: [31853]: debug 192.168.25.135 already configured


My question is - Is it prerequisite for IP resource to be manually
added before it can be protected via cluster?

Thanks
Parvez

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] IP Resource behavior with Red Hat Cluster

2010-12-24 Thread Parvez Shaikh
Hi Rajagopal,

Thank you for your response

I have created a cluster configuration by adding IP resource with
value 192.168.25.153 (some value) and created a service which just has
IP resource part of it. I have set all requisite configuration such
two node, node names, failover,fencing etc.

Upon trying to start then service(enable service),it failed -

clurgmgrd: [31853]: debug 192.168.25.135 is not configured

Then manually added this IP to host

ifconfig eth0:1 192.168.25.135

Then service could start but it gave message -

clurgmgrd: [31853]: debug 192.168.25.135 already configured

So do I have to add virtual interface manually (as above or any other
method?) before I could start service with IP resource under it?

Thanks
Parvez

On Fri, Dec 24, 2010 at 11:30 AM, Rajagopal Swaminathan
raju.rajs...@gmail.com wrote:
 Greetings,

 On Fri, Dec 24, 2010 at 5:33 AM, Parvez Shaikh
 parvez.h.sha...@gmail.com wrote:
 Hi all,

 I manually made this virtual IP available on host and then started
 service it worked -


 Can you please elaborate? did you try to assign IP to the ethx devices
 and then ping?

 clurgmgrd: [31853]: debug 192.168.25.135 already configured


 My question is - Is it prerequisite for IP resource to be manually
 added before it can be protected via cluster?


 Every resource/service has to be added to the cluster.

 And they cannot be used by anything else.

 Regards,

 Rajagopal

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] IP Resource behavior with Red Hat Cluster

2010-12-24 Thread Parvez Shaikh
Thanks a ton Jakov. It has clarified my doubts.

Yours gratefully,
Parvez

On Sat, Dec 25, 2010 at 6:34 AM, Jakov Sosic jakov.so...@srce.hr wrote:
 On 12/24/2010 05:46 PM, Parvez Shaikh wrote:
 Hi Jakov

 Thank you for your response. My two hosts have multiple network
 interfaces or ethernet cards. I understood from your email, that the
 IP corresponding to cluster node name for both hosts, should be in
 the same subnet before a cluster could bring virtual IP up.

 No... you misunderstood me. I meant that if the virtual address is
 192.168.25.X, than you have to have interface on each node that is set
 up with the ip address from the same subnet. That interface does not
 need to correspond to the cluster node name. For example:

 node1 - eth0 - 192.168.1.11 (netmask 255.255.255.0)
 node2 - eth0 - 192.168.1.12 (netmask 255.255.255.0)

 IP resource - 192.168.25.100


 Now, how do you expect the cluster to know what to do with IP resource?
 On which interface can cluster glue 192.168.25.100? eth0? But why eth0?
 And what is the netmask? What about routes?

 So, you need to have for example eth1 on both machines set up in the
 same subnet, so that cluster can glue IP address from IP resource to
 that exact interface (which is set up statically). So you also have to
 have for example:

 node1 - eth1 - 192.168.25.47 (netmask 255.255.255.0)
 node2 - eth1 - 192.168.25.48 (netmask 255.255.255.0)

 Now, rgmanager will know where to activate IP resource, because
 192.168.25.100 belongs to 192.168.25.0/24 subnet, which is active on
 node1/eth1 and node2/eth2.

 If you were to have another IP resource, for example 192.168.240.44, you
 would need another interface with another set of static ip addresses on
 every host you intend to run IP resource on...


 I hope you get it correctly now.





 --
 Jakov Sosic
 www.srce.hr

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] IP Resource behavior with Red Hat Cluster

2010-12-26 Thread Parvez Shaikh
Hi

I chose my IP resource as 192.168.13.15, I had eth3 configured on
192.168.13.1 but it still failed with error -

Dec 27 17:35:32 datablade1 clurgmgrd[31853]: err Error storing ip: Duplicate
Dec 27 17:36:55 datablade1 clurgmgrd[31853]: notice Starting
disabled service service:service1
Dec 27 17:36:55 datablade1 clurgmgrd[31853]: notice start on ip
192.168.13.15/24 returned 1 (generic error)
Dec 27 17:36:55 datablade1 clurgmgrd[31853]: warning #68: Failed to
start service:service1; return value: 1

Below is set of interfaces -

eth0  Link encap:Ethernet  HWaddr 00:10:18:66:15:70
  inet addr:192.168.10.1  Bcast:192.168.10.255  Mask:255.255.255.0
  inet6 addr: fe80::210:18ff:fe66:1570/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:125 errors:0 dropped:0 overruns:0 frame:0
  TX packets:305 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:32679 (31.9 KiB)  TX bytes:42477 (41.4 KiB)
  Interrupt:177 Memory:9800-98012800

eth1  Link encap:Ethernet  HWaddr 00:10:18:66:15:72
  inet addr:192.168.11.1  Bcast:192.168.11.255  Mask:255.255.255.0
  inet6 addr: fe80::210:18ff:fe66:1572/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:1237019 errors:0 dropped:0 overruns:0 frame:0
  TX packets:1919245 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:183885611 (175.3 MiB)  TX bytes:337885336 (322.2 MiB)
  Interrupt:154 Memory:9a00-9a012800

eth2  Link encap:Ethernet  HWaddr 00:10:18:66:15:74
  inet addr:192.168.12.1  Bcast:192.168.12.255  Mask:255.255.255.0
  inet6 addr: fe80::210:18ff:fe66:1574/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:419008 errors:0 dropped:0 overruns:0 frame:0
  TX packets:29 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:26822898 (25.5 MiB)  TX bytes:5992 (5.8 KiB)
  Interrupt:185 Memory:9400-94012800

eth3  Link encap:Ethernet  HWaddr 00:10:18:66:15:76
  inet addr:192.168.13.1  Bcast:192.168.13.255  Mask:255.255.255.0
  UP BROADCAST MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
  Interrupt:162 Memory:9600-96012800


On Sat, Dec 25, 2010 at 6:34 AM, Jakov Sosic jakov.so...@srce.hr wrote:
 On 12/24/2010 05:46 PM, Parvez Shaikh wrote:
 Hi Jakov

 Thank you for your response. My two hosts have multiple network
 interfaces or ethernet cards. I understood from your email, that the
 IP corresponding to cluster node name for both hosts, should be in
 the same subnet before a cluster could bring virtual IP up.

 No... you misunderstood me. I meant that if the virtual address is
 192.168.25.X, than you have to have interface on each node that is set
 up with the ip address from the same subnet. That interface does not
 need to correspond to the cluster node name. For example:

 node1 - eth0 - 192.168.1.11 (netmask 255.255.255.0)
 node2 - eth0 - 192.168.1.12 (netmask 255.255.255.0)

 IP resource - 192.168.25.100


 Now, how do you expect the cluster to know what to do with IP resource?
 On which interface can cluster glue 192.168.25.100? eth0? But why eth0?
 And what is the netmask? What about routes?

 So, you need to have for example eth1 on both machines set up in the
 same subnet, so that cluster can glue IP address from IP resource to
 that exact interface (which is set up statically). So you also have to
 have for example:

 node1 - eth1 - 192.168.25.47 (netmask 255.255.255.0)
 node2 - eth1 - 192.168.25.48 (netmask 255.255.255.0)

 Now, rgmanager will know where to activate IP resource, because
 192.168.25.100 belongs to 192.168.25.0/24 subnet, which is active on
 node1/eth1 and node2/eth2.

 If you were to have another IP resource, for example 192.168.240.44, you
 would need another interface with another set of static ip addresses on
 every host you intend to run IP resource on...


 I hope you get it correctly now.





 --
 Jakov Sosic
 www.srce.hr

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] IP Resource behavior with Red Hat Cluster

2010-12-26 Thread Parvez Shaikh
Hi all

Issue has been resolved. After debugging a bit I found that link to
eth was not detected -

ethtool ethX | grep Link detected: |  awk '{print $3}'
Output - no

After resolving around this, I could get my IP resource up.

Thank you for your kind suggestions and interest in this problem

Gratefully yours

On Mon, Dec 27, 2010 at 12:18 PM, Rajagopal Swaminathan
raju.rajs...@gmail.com wrote:
 Greetinds,

 On Mon, Dec 27, 2010 at 9:51 AM, Parvez Shaikh
 parvez.h.sha...@gmail.com wrote:
 Hi


 Dec 27 17:35:32 datablade1 clurgmgrd[31853]: err Error storing ip: 
 Duplicate
 Dec 27 17:36:55 datablade1 clurgmgrd[31853]: notice Starting
 disabled service service:service1
 Dec 27 17:36:55 datablade1 clurgmgrd[31853]: notice start on ip
 192.168.13.15/24 returned 1 (generic error)
 Dec 27 17:36:55 datablade1 clurgmgrd[31853]: warning #68: Failed to
 start service:service1; return value: 1

 Below is set of interfaces -


 What does the ip addr show command say?

 Regards,

 Rajagopal

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Determining red hat cluster version

2011-01-05 Thread Parvez Shaikh
Hi Fabio

This produces output -

cman-2.0.115-29.el5

So does it indicate 2.0.115-29 is version?

On Thu, Jan 6, 2011 at 12:34 PM, Fabio M. Di Nitto fdini...@redhat.com wrote:
 On 1/6/2011 6:24 AM, Parvez Shaikh wrote:
 Hi all,

 Is there any command which states Red Hat cluster version?

 I tried cman_tool version, and ccs_tool -V both produce different
 results, most likely reporting version of their own (not of Cluster
 suite)


 rpm -q -f $(which cman_tool) is one option, otherwise you need to parse
 cman_tool protocol version manually.

 Fabio

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Determining red hat cluster version

2011-01-06 Thread Parvez Shaikh
Thanks Fabio

Is this version same as what can be referred as version of Red Hat
Cluster Suite?

The reason I am asking is, as a part of RHCS there are various
components (Cluster_Administration-en-US, cluster-cim, cluster-snmp,
cman, rgmanager, luci, ricci etc etc) and each of which shows its own
version -

cman has version as below, rgmanager as version 2.0.52. cluster-cim
and cluster-snmp,modcluster has version 0.12.1, system-config-cluster
has 1.0.57 version.

Is there  one version number referring to Cluster Suite which would
have encompassed entire set of components (with their own versions may
be)

Gratefully yours

On Thu, Jan 6, 2011 at 1:14 PM, Fabio M. Di Nitto fdini...@redhat.com wrote:
 On 1/6/2011 8:28 AM, Parvez Shaikh wrote:
 Hi Fabio

 This produces output -

 cman-2.0.115-29.el5

 So does it indicate 2.0.115-29 is version?

 yes

 Fabio

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


[Linux-cluster] configuring bladecenter fence device

2011-01-06 Thread Parvez Shaikh
Hi all,

From RHCS documentation, I could see that bladecenter is one of the
fence devices -
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Cluster_Administration/ap-fence-device-param-CA.html

Table B.9. IBM Blade Center
Field   Description
NameA name for the IBM BladeCenter device connected to the cluster.
IP Address  The IP address assigned to the device.
Login   The login name used to access the device.
PasswordThe password used to authenticate the connection to the device.
Password Script (optional)  The script that supplies a password for
access to the fence device. Using this supersedes the Password
parameter.
Blade   The blade of the device.
Use SSH (Rhel 5.4 and later) Indicates that system will use SSH to
access the device.

As per my understanding, IP address is IP address of management module
of IBM blade center, login/password represent credentials to access
the same.

However did not get the parameter 'Blade'. How does it play role in fencing?

In a situation where there  are two blades - Blade-1 and Blade-2 and
if Blade-1 goes down(hardware node failure), Blade-2 should fence out
Blade-1, in that situation fenced on Blade-2 should power off(?)
blade-2 using fence_bladecenter, so how should below sniplet of
cluster.conf file should look like? -


clusternodes
clusternode name=blade1 nodeid=1 votes=1
fence
method name=1
device blade=?
name=BLADECENTER/
/method
/fence
/clusternode
clusternode name=blade2 nodeid=2 votes=1
fence
method name=1
device blade=
name=BLADECENTER/
/method
/fence
/clusternode
/clusternodes

In which situation fence_bladecenter would be used to power on the blade?

Your gratefully

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] configuring bladecenter fence device

2011-01-06 Thread Parvez Shaikh
Hi Ben

Thanks a ton for below information. But I have doubt on cluster.conf
file snippet below -

   clusternode name=node1 votes=1
   fence
   method name=1
   device blade=2 name=chassis_fence/
   /method
   /fence
   /clusternode

Here for node1 device blade is 2. Does it mean node1 is blade[2]
from AMM perspective? So in order to fence out node1 fence_bladecenter
would turn off blade[2]?

Thanks

On Thu, Jan 6, 2011 at 9:36 PM, Ben Turner btur...@redhat.com wrote:
 To address:

 As per my understanding, IP address is IP address of management module
 of IBM blade center, login/password represent credentials to access
 the same.

 Correct.

 However did not get the parameter 'Blade'. How does it play role in
 fencing?

 If I recall correctly the blade= is the identifier used to identify the 
 blade in the AMM.  I can't remember if it is a number of a slot or a user 
 defined name.  It corresponds to

 # fence_bladecenter -h

   -n, --plug=id                Physical plug number on device or
                                        name of virtual machine
 If the fencing code:

        port : {
                getopt : n:,
                longopt : plug,
                help : -n, --plug=id                Physical plug number 
 on device or\n +
                                                        name of virtual 
 machine,
                required : 1,
                shortdesc : Physical plug number or name of virtual 
 machine,
                order : 1 },

 To test this try running:

 /sbin/fence_bladecenter -a ip or hostname of bladecenter -l login -p 
 passwd -n blade number of the blade you want to fence -o status -v

 An example cluster.conf looks like:

                clusternode name=node1 votes=1
                        fence
                                method name=1
                                        device blade=2 
 name=chassis_fence/
                                /method
                        /fence
                /clusternode
                clusternode name=node2 votes=1
                        fence
                                method name=1
                                        device blade=3 
 name=chassis_fence/
                                /method
                        /fence

       fencedevices
                fencedevice agent=fence_bladecenter ipaddr=XXX.XXX.1.143 
 login=rchs_fence name=chassis_fence
 passwd=XXX/
        /fencedevices

 -Ben




 - Original Message -
 Hi all,

 From RHCS documentation, I could see that bladecenter is one of the
 fence devices -
 http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Cluster_Administration/ap-fence-device-param-CA.html

 Table B.9. IBM Blade Center
 Field Description
 Name A name for the IBM BladeCenter device connected to the cluster.
 IP Address The IP address assigned to the device.
 Login The login name used to access the device.
 Password The password used to authenticate the connection to the
 device.
 Password Script (optional) The script that supplies a password for
 access to the fence device. Using this supersedes the Password
 parameter.
 Blade The blade of the device.
 Use SSH (Rhel 5.4 and later) Indicates that system will use SSH to
 access the device.

 As per my understanding, IP address is IP address of management module
 of IBM blade center, login/password represent credentials to access
 the same.

 However did not get the parameter 'Blade'. How does it play role in
 fencing?

 In a situation where there are two blades - Blade-1 and Blade-2 and
 if Blade-1 goes down(hardware node failure), Blade-2 should fence out
 Blade-1, in that situation fenced on Blade-2 should power off(?)
 blade-2 using fence_bladecenter, so how should below sniplet of
 cluster.conf file should look like? -


 clusternodes
 clusternode name=blade1 nodeid=1 votes=1
 fence
 method name=1
 device blade=?
 name=BLADECENTER/
 /method
 /fence
 /clusternode
 clusternode name=blade2 nodeid=2 votes=1
 fence
 method name=1
 device blade=
 name=BLADECENTER/
 /method
 /fence
 /clusternode
 /clusternodes

 In which situation fence_bladecenter would be used to power on the
 blade?

 Your gratefully

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] configuring bladecenter fence device

2011-01-06 Thread Parvez Shaikh
Thanks Hugo

Your gratefully


On Fri, Jan 7, 2011 at 11:09 AM, Hugo Lombard h...@elizium.za.net wrote:
 On Fri, Jan 07, 2011 at 10:12:16AM +0530, Parvez Shaikh wrote:

                clusternode name=node1 votes=1
                        fence
                                method name=1
                                        device blade=2 
 name=chassis_fence/
                                /method
                        /fence
                /clusternode

 Here for node1 device blade is 2. Does it mean node1 is blade[2]
 from AMM perspective? So in order to fence out node1 fence_bladecenter
 would turn off blade[2]?


 Hi Parvez

 We use BladeCenters for our clusters, and I can confirm that the
 'blade=2' parameter will translate to 'blade[2]' on the AMM.  IOW, the
 '2' is the slot number that the blade is in.

 Two more things that might be of help:

 - The user specified in the 'login' parameter under the fencedevice
  should be a 'Blade Administrator' for the slots in question.

 - If you're running with SELinux enabled, check that the
  'fenced_can_network_connect' boolean is set to 'on'.

 Regards

 --
 Hugo Lombard

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


[Linux-cluster] Error while manual fencing and output of clustat

2011-01-10 Thread Parvez Shaikh
Dear experts,

I have two node cluster(node1 and node2), and manual fencing is
configured. Service S2 is running on node2. To ensure failover happen,
I shutdown node2.. I see following messages in /var/log/messages -

agent fence_manual reports: failed: fence_manual
no node name

fence_ack_manual -n node2 doesn't work saying there is no FIFO in
/tmp. fence_ack_manual -n node2 -e do work and then service S2 fails
over to node2.

Trying to find out why fence_manual is reporting error? node2 is
pingable hostname and its entry is in /etc/hosts of node1 (and vice
versa).  I also see that after failover when I do clustat -x I get
cluster status (in XML format) with -

?xml version=1.0?
clustat version=4.1.1
  groups
group name=service:S state=111 state_str=starting flags=0
flags_str= owner=node1 last_owner=node1 restarts=0
last_transition=1294676678 last_transition_str=xx/
  /groups
/clustat

I was expecting last_owner would correspond to node2(because this is
node which was running service S and has failed); which would indicate
that service is failing over FROM node2. Is there a way that node in
cluster (a node on which service is failing over) could determine from
which node the given service is failing over?

Any inputs would be greatly appreciated.

Thanks

Yours gratefully

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Error while manual fencing and output of clustat

2011-01-10 Thread Parvez Shaikh
Thanks Xaviar.

It resolved the error on fencing.

However I still am grappling with issue of finding name of Failed
cluster node on another cluster node to which service on failed node
has failed over to.

I was using output of clustat -x -S service name and was parsing XML
file to obtain value of last_owner field.

Any input on how to find out name of failed node on another cluster
node, over which services from failed node are starting?

Thanks

On Mon, Jan 10, 2011 at 6:58 PM, Xavier Montagutelli
xavier.montagute...@unilim.fr wrote:
 Hello Parvez,

 On Monday 10 January 2011 09:51:14 Parvez Shaikh wrote:
 Dear experts,

 I have two node cluster(node1 and node2), and manual fencing is
 configured. Service S2 is running on node2. To ensure failover happen,
 I shutdown node2.. I see following messages in /var/log/messages -

                     agent fence_manual reports: failed: fence_manual
 no node name

 I am not an expert, but could you show us your cluster.conf file ?

 You need to give a nodename attribute to the fence_manual agent somewhere,
 the error message makes me think it's missing.

 For example :

        fencedevices
                fencedevice agent=fence_manual name=my_fence_manual/
        /fencedevices
 ...
 clusternode name=node2 ...
  fence
     method name=1
         device name=my_fence_manual nodename=node2/
       /method
   /fence
 /clusternode


 fence_ack_manual -n node2 doesn't work saying there is no FIFO in
 /tmp. fence_ack_manual -n node2 -e do work and then service S2 fails
 over to node2.

 Trying to find out why fence_manual is reporting error? node2 is
 pingable hostname and its entry is in /etc/hosts of node1 (and vice
 versa).  I also see that after failover when I do clustat -x I get
 cluster status (in XML format) with -

 ?xml version=1.0?
 clustat version=4.1.1
   groups
     group name=service:S state=111 state_str=starting flags=0
 flags_str= owner=node1 last_owner=node1 restarts=0
 last_transition=1294676678 last_transition_str=xx/
   /groups
 /clustat

 I was expecting last_owner would correspond to node2(because this is
 node which was running service S and has failed); which would indicate
 that service is failing over FROM node2. Is there a way that node in
 cluster (a node on which service is failing over) could determine from
 which node the given service is failing over?

 Any inputs would be greatly appreciated.

 Thanks

 Yours gratefully

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster


 --
 Xavier Montagutelli                      Tel : +33 (0)5 55 45 77 20
 Service Commun Informatique              Fax : +33 (0)5 55 45 75 95
 Universite de Limoges
 123, avenue Albert Thomas
 87060 Limoges cedex

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


[Linux-cluster] Determining failed node on another node of cluster during failover

2011-01-11 Thread Parvez Shaikh
Hi all,

Taking this question from another thread, here is a challenge that I am facing -

Following is simple cluster configuration -

Node 1, node 2, node 3, and node4 are part of cluster, its
unrestricted unordered fail-over domain with active - active nxn
configuration

So a node 2 can get services from node1, node3 or node4 when any of
these(1,3,4) node fails(e.g. power failure).

In that event I want to find out which of the node has failed over
node2, I was invoking clustat -x -S service name on node2 in my
custom agent and was parsing for last_owner field to obtain name of
node on which service was previously running.

This however doesn't seem to be working in case if I shutdown node(but
works if I migrate service from one node to another using clusvcadm)

Is there anyway that I can find out which node has failed during
failover of service on a standby node? Any tool which I might have
missed or some command which I can send to ccsd to get this
information

Thanks

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Determining failed node on another node of clusterduring failover

2011-01-12 Thread Parvez Shaikh
Hi

Is monitoring package part of RHCS? What is name of this component?

Is there any other mechanism which doesn't require to parse
log/messages to determine which node has left the cluster on stand by
node before failover is complee?

Thanks

On Wed, Jan 12, 2011 at 2:58 PM, Kit Gerrits kitgerr...@gmail.com wrote:

 Hello,

 If you want to find out which cluster node has failed, you could either
 check /var/log/messages and see which member has left the cluster, or you
 can set up monitoring to check if your servers are all in good shape.

 If you are running a cluster, I would suggest also setting up monitoring.
 The monitoring package can then notify you if any cluster member fails.


 Regards,

 Kit

 -Original Message-
 From: linux-cluster-boun...@redhat.com
 [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Parvez Shaikh
 Sent: woensdag 12 januari 2011 7:04
 To: linux clustering
 Subject: [Linux-cluster] Determining failed node on another node of
 clusterduring failover

 Hi all,

 Taking this question from another thread, here is a challenge that I am
 facing -

 Following is simple cluster configuration -

 Node 1, node 2, node 3, and node4 are part of cluster, its unrestricted
 unordered fail-over domain with active - active nxn configuration

 So a node 2 can get services from node1, node3 or node4 when any of
 these(1,3,4) node fails(e.g. power failure).

 In that event I want to find out which of the node has failed over node2, I
 was invoking clustat -x -S service name on node2 in my custom agent and
 was parsing for last_owner field to obtain name of node on which service
 was previously running.

 This however doesn't seem to be working in case if I shutdown node(but works
 if I migrate service from one node to another using clusvcadm)

 Is there anyway that I can find out which node has failed during failover of
 service on a standby node? Any tool which I might have missed or some
 command which I can send to ccsd to get this information

 Thanks

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Determining failed node on another node of clusterduring failover

2011-01-12 Thread Parvez Shaikh
Hi,

I have been using clustat command. clustat -x -s servicename to get
following XML file -

?xml version=1.0?
clustat version=4.1.1
  groups
group name=service:service_on_node1 state=112
state_str=started flags=0 flags_str= owner=node1
last_owner=none restarts=0 last_transition=1294752663
last_transition_str=Tue Jan 11 19:01:03 2011/
  /groups
/clustat

I was under impression that last_owner field in the above XML file
should give me node name where service was last running. I was parsing
this XML file to obtain this information.

Note that, this holds true if you migrate or relocate service from one
node to another using clusvcadm or from conga or system-config-luster
BUT if node is shutdown and service relocate to another node,
last_owner is either 'none' or same as current node on which service
is relocated.

Parsing var/messages/log is easy but not optimal solution, it will
need greping entire log file for some specific message where failed
node name is appearing in clumgr messages.



On Thu, Jan 13, 2011 at 4:35 AM, Kit Gerrits kitgerr...@gmail.com wrote:

 Hello,

 The Clustering software itself monitors nodes and devices in use by cluster
 services, but logs to /var/log/messages.
 A quick overview is presented by the 'clustat' command.

 Monitoring tools are freely available for any platform.
 Basic monitoring in Linux is available with Big Brother, Cacti, OpenNMS or
 Nagios (in order of increasing complexity).
 If you're bound to windows, maybe try ServersCheck .


 Parsing logs can be trivial, once you know how.
 What do you want to know and when do you want to know it?

 Have you looked at 'clustat' and 'cman_tool'?


 Regards,

 Kit

 -Original Message-
 From: linux-cluster-boun...@redhat.com
 [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Parvez Shaikh
 Sent: woensdag 12 januari 2011 11:01
 To: linux clustering
 Subject: Re: [Linux-cluster] Determining failed node on another node of
 clusterduring failover

 Hi

 Is monitoring package part of RHCS? What is name of this component?

 Is there any other mechanism which doesn't require to parse log/messages to
 determine which node has left the cluster on stand by node before failover
 is complee?

 Thanks

 On Wed, Jan 12, 2011 at 2:58 PM, Kit Gerrits kitgerr...@gmail.com wrote:

 Hello,

 If you want to find out which cluster node has failed, you could
 either check /var/log/messages and see which member has left the
 cluster, or you can set up monitoring to check if your servers are all in
 good shape.

 If you are running a cluster, I would suggest also setting up monitoring.
 The monitoring package can then notify you if any cluster member fails.


 Regards,

 Kit

 -Original Message-
 From: linux-cluster-boun...@redhat.com
 [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Parvez Shaikh
 Sent: woensdag 12 januari 2011 7:04
 To: linux clustering
 Subject: [Linux-cluster] Determining failed node on another node of
 clusterduring failover

 Hi all,

 Taking this question from another thread, here is a challenge that I
 am facing -

 Following is simple cluster configuration -

 Node 1, node 2, node 3, and node4 are part of cluster, its
 unrestricted unordered fail-over domain with active - active nxn
 configuration

 So a node 2 can get services from node1, node3 or node4 when any of
 these(1,3,4) node fails(e.g. power failure).

 In that event I want to find out which of the node has failed over
 node2, I was invoking clustat -x -S service name on node2 in my
 custom agent and was parsing for last_owner field to obtain name of
 node on which service was previously running.

 This however doesn't seem to be working in case if I shutdown node(but
 works if I migrate service from one node to another using clusvcadm)

 Is there anyway that I can find out which node has failed during
 failover of service on a standby node? Any tool which I might have
 missed or some command which I can send to ccsd to get this
 information

 Thanks

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster


 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


Re: [Linux-cluster] Determining failed node on another node of clusterduring failover

2011-01-13 Thread Parvez Shaikh
Hi

Any idea on how to get name of failed node using available cluster
tools or commands? I have tried clustat but it seems to be producing
unexpected output.

I will have to obtain this information on target host/node; to which
service is relocating as a part of failover.

Thanks in advance

Gratefully yours



On 1/13/11, Parvez Shaikh parvez.h.sha...@gmail.com wrote:
 Hi,

 I have been using clustat command. clustat -x -s servicename to get
 following XML file -

 ?xml version=1.0?
 clustat version=4.1.1
   groups
 group name=service:service_on_node1 state=112
 state_str=started flags=0 flags_str= owner=node1
 last_owner=none restarts=0 last_transition=1294752663
 last_transition_str=Tue Jan 11 19:01:03 2011/
   /groups
 /clustat

 I was under impression that last_owner field in the above XML file
 should give me node name where service was last running. I was parsing
 this XML file to obtain this information.

 Note that, this holds true if you migrate or relocate service from one
 node to another using clusvcadm or from conga or system-config-luster
 BUT if node is shutdown and service relocate to another node,
 last_owner is either 'none' or same as current node on which service
 is relocated.

 Parsing var/messages/log is easy but not optimal solution, it will
 need greping entire log file for some specific message where failed
 node name is appearing in clumgr messages.



 On Thu, Jan 13, 2011 at 4:35 AM, Kit Gerrits kitgerr...@gmail.com wrote:

 Hello,

 The Clustering software itself monitors nodes and devices in use by
 cluster
 services, but logs to /var/log/messages.
 A quick overview is presented by the 'clustat' command.

 Monitoring tools are freely available for any platform.
 Basic monitoring in Linux is available with Big Brother, Cacti, OpenNMS
 or
 Nagios (in order of increasing complexity).
 If you're bound to windows, maybe try ServersCheck .


 Parsing logs can be trivial, once you know how.
 What do you want to know and when do you want to know it?

 Have you looked at 'clustat' and 'cman_tool'?


 Regards,

 Kit

 -Original Message-
 From: linux-cluster-boun...@redhat.com
 [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Parvez Shaikh
 Sent: woensdag 12 januari 2011 11:01
 To: linux clustering
 Subject: Re: [Linux-cluster] Determining failed node on another node of
 clusterduring failover

 Hi

 Is monitoring package part of RHCS? What is name of this component?

 Is there any other mechanism which doesn't require to parse log/messages
 to
 determine which node has left the cluster on stand by node before
 failover
 is complee?

 Thanks

 On Wed, Jan 12, 2011 at 2:58 PM, Kit Gerrits kitgerr...@gmail.com
 wrote:

 Hello,

 If you want to find out which cluster node has failed, you could
 either check /var/log/messages and see which member has left the
 cluster, or you can set up monitoring to check if your servers are all
 in
 good shape.

 If you are running a cluster, I would suggest also setting up
 monitoring.
 The monitoring package can then notify you if any cluster member fails.


 Regards,

 Kit

 -Original Message-
 From: linux-cluster-boun...@redhat.com
 [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Parvez Shaikh
 Sent: woensdag 12 januari 2011 7:04
 To: linux clustering
 Subject: [Linux-cluster] Determining failed node on another node of
 clusterduring failover

 Hi all,

 Taking this question from another thread, here is a challenge that I
 am facing -

 Following is simple cluster configuration -

 Node 1, node 2, node 3, and node4 are part of cluster, its
 unrestricted unordered fail-over domain with active - active nxn
 configuration

 So a node 2 can get services from node1, node3 or node4 when any of
 these(1,3,4) node fails(e.g. power failure).

 In that event I want to find out which of the node has failed over
 node2, I was invoking clustat -x -S service name on node2 in my
 custom agent and was parsing for last_owner field to obtain name of
 node on which service was previously running.

 This however doesn't seem to be working in case if I shutdown node(but
 works if I migrate service from one node to another using clusvcadm)

 Is there anyway that I can find out which node has failed during
 failover of service on a standby node? Any tool which I might have
 missed or some command which I can send to ccsd to get this
 information

 Thanks

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster


 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


[Linux-cluster] Questions related to cluster quorum and fencing

2011-01-18 Thread Parvez Shaikh
Hi all,

*Quorum - *
The questions are bit theoretical, I have gone through documentation and man
pages and have understood that, a cluster is quorate if a cluster or its
partition has nodes, with votes equal to or more than expected_votes in
cman section of cluster.conf file (with no requirement mandating use of
quorum disk)

So how does cluster being quorate or non-quorate affects functioning of a
cluster or services? If cluster is non-quorate, does it indicate an alarming
situation and why?

If a cluster is composed of resource groups which including only IP resource
and script resource monitoring my application server listening on IP
resource(no shared disk or shared resource between cluster nodes), then is
cluster being quorate (or non quorate) important for services and/or
cluster?

*Fencing - *
Is fencing and cluster being quorate or non-quorate related? I tried one
experiment, wherein I removed fencing for cluster nodes and shutdown one
of the nodes in cluster. And I got message in /var/log/messages indicating
fencing failed for node, and service was not failed over from that node. So
is fencing mandatory even if there is no shared disk between two cluster
nodes?

Also is a cluster non-quorate in a time window when a node has failed and
has not been fenced successfully?

Yours gratefully
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] Running cluster tools using non-root user

2011-01-25 Thread Parvez Shaikh
Hi all

Is it possible to run cluster tools like clustat or clusvcadm etc. using
non-root user?

If yes, to which groups this user should belong to? Otherwise can this be
done using sudo(and sudoers) file.

As of now I get following error on clustat -

Could not connect to CMAN: Permission denied


Thanks,
Parvez
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Running cluster tools using non-root user

2011-01-27 Thread Parvez Shaikh
I believe Pacemaker is not same as RHCS or do they share code?

If yes, in which version of RHCS would this feature would be available?

I require to enable service, disable service, and get status. I am using CLI
tools and any scripting trick can help me running clusvcadm and/or clustat.

su -c clusvcadm require entering password, can this also be eliminated
using sudoers?

Thanks

On Wed, Jan 26, 2011 at 3:22 PM, Andrew Beekhof and...@beekhof.net wrote:

 [Shameless plug]

 The next version of Pacemaker (1.1.6) will have this feature :-)
 The patches were merged form our devel branch about a week ago.

 [/Shameless plug]

 On Tue, Jan 25, 2011 at 10:39 AM, Parvez Shaikh
 parvez.h.sha...@gmail.com wrote:
  Hi all
 
  Is it possible to run cluster tools like clustat or clusvcadm etc. using
  non-root user?
 
  If yes, to which groups this user should belong to? Otherwise can this be
  done using sudo(and sudoers) file.
 
  As of now I get following error on clustat -
 
  Could not connect to CMAN: Permission denied
 
 
  Thanks,
  Parvez
 
  --
  Linux-cluster mailing list
  Linux-cluster@redhat.com
  https://www.redhat.com/mailman/listinfo/linux-cluster
 

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] Tuning red hat cluster

2011-02-10 Thread Parvez Shaikh
Hi,

As per my understanding rgmanager invokes 'status' on resource groups
periodically to determine if these resources are up or down.

I observed that this period is of around 30 seconds. Is it possible to tune
or adjust this period for individual services or resource groups?

Thanks
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] SNMP support with IBM Blade Center Fence Agent

2011-02-27 Thread Parvez Shaikh
Hi all,

I have a question related to fence agents and SNMP alarms.

Fence Agent can fail to fence the failed node for various reason; e.g. with
my bladecenter fencing agent, I sometimes get message saying bladecenter
fencing failed because of timeout or fence device IP address/user
credentials are incorrect.

In such a situation is it possible to generate SNMP trap?

My cluster config file looks like below and in my case if bladecenter
fencing fails, manual fencing kicks in and requires user to do
fence_ack_manual, for this user must at least be notified via SNMP (or any
other mechanism?) to intervene  -

  clusternodes
clusternode name=blade2 nodeid=2 votes=1
  fence
method name=1
  device blade=2 name=BladeCenterFencing/
/method
method name=2
  device name=ManualFencing nodename=blade2/
/method
  /fence
/clusternode
clusternode name=blade1 nodeid=1 votes=1
  fence
method name=1
  device blade=1 name=BladeCenterFencing/
/method
method name=2
  device name=ManualFencing nodename=blade1/
/method
  /fence
/clusternode
  /clusternodes
  cman expected_votes=1 two_node=1/
  fencedevices
fencedevice agent=fence_bladecenter ipaddr=blade-mm.com
login=USERID name=BladeCenterFencing passwd=PASSW0RD/
fencedevice agent=fence_manual name=ManualFencing/
  /fencedevices

Thanks,
Parvez
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] SNMP support with IBM Blade Center Fence Agent

2011-03-01 Thread Parvez Shaikh
Hi Ryan,

Thank you for response. Does it mean there is no way to intimate
administrator about failure of fencing as of now?

Let me give more information about my cluster -

I have set of nodes in cluster with only IP resource being protected. I have
two levels of fencing, first bladecenter fencing and second one is manual
fencing.

At times if machine is already down(either power failure or turned off
abrupty); blade center fencing timesout and manual fencing happens. At this
time, administrator is expected to run fence_ack_manual.

Clearly this is not something which is desirable, as downtime of services is
as long as administrator runs fence_ack_manual.

What is recommended method to deal with  blade center fencing failure in
this situation? Do I have to add another level of fencing(between blade
center and manual) which can fence automatically(not requiring manual
interference)?


Thanks









On Mon, Feb 28, 2011 at 9:44 PM, Ryan O'Hara roh...@redhat.com wrote:

 On Mon, Feb 28, 2011 at 12:43:10PM +0530, Parvez Shaikh wrote:
  Hi all,
 
  I have a question related to fence agents and SNMP alarms.
 
  Fence Agent can fail to fence the failed node for various reason; e.g.
 with
  my bladecenter fencing agent, I sometimes get message saying bladecenter
  fencing failed because of timeout or fence device IP address/user
  credentials are incorrect.
 
  In such a situation is it possible to generate SNMP trap?

 This feature will be in RHEL6.1. There is a new project called
 'foghorn' that creates SNMPv2 traps from dbus signals.

 git://git.fedorahosted.org/foghorn.git

 In RHEL6.1 (and the latest upstream release), certain cluster
 components will emit dbus signals when certain events occurs. This
 includes fencing. So when a node is fenced a dbus signal is generated
 by fenced. The foghorn service catches this signal and generated
 SNMPv2 trap.

 Note that foghorn runs as an AgentX subagent, so snmpd must be running
 as the master agentx.

 Ryan

  My cluster config file looks like below and in my case if bladecenter
  fencing fails, manual fencing kicks in and requires user to do
  fence_ack_manual, for this user must at least be notified via SNMP (or
 any
  other mechanism?) to intervene  -
 
clusternodes
  clusternode name=blade2 nodeid=2 votes=1
fence
  method name=1
device blade=2 name=BladeCenterFencing/
  /method
  method name=2
device name=ManualFencing nodename=blade2/
  /method
/fence
  /clusternode
  clusternode name=blade1 nodeid=1 votes=1
fence
  method name=1
device blade=1 name=BladeCenterFencing/
  /method
  method name=2
device name=ManualFencing nodename=blade1/
  /method
/fence
  /clusternode
/clusternodes
cman expected_votes=1 two_node=1/
fencedevices
  fencedevice agent=fence_bladecenter ipaddr=blade-mm.com
  login=USERID name=BladeCenterFencing passwd=PASSW0RD/
  fencedevice agent=fence_manual name=ManualFencing/
/fencedevices
 
  Thanks,
  Parvez

  --
  Linux-cluster mailing list
  Linux-cluster@redhat.com
  https://www.redhat.com/mailman/listinfo/linux-cluster

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] SNMP support with IBM Blade Center Fence Agent

2011-03-04 Thread Parvez Shaikh
Hi Lon,

Thank you for reply.

What I gathered from your response is to remove manual fencing at once. This
will cause fence daemon to retry fence_bladecenter until the node is fenced.
More likely the fenced will succeed in fencing the failed node(provided IP,
user name and password for bladecenter management module are right); even if
it times out for the first time. Am I right?

I will try removing manual fencing and see how things go.


 If fencing is failing (permanently), you can still run:
   fence_ack_manual -e -n nodename

By the way as per my understanding fence_ack_manual -n node name can be
executed to acknowledge only manually fenced node(and not bladecenter fenced
node), correct me if this understanding is wrong. So God forbid, if
fence_bladecenter fails for some reason; we still have option to run
fence_manual and then fence_ack_manual, so cluster is back to working.

Thanks again and have great weekend ahead

Yours truly,
Parvez

On Fri, Mar 4, 2011 at 10:45 PM, Lon Hohberger l...@redhat.com wrote:

 On Tue, Mar 01, 2011 at 06:50:18PM +0530, Parvez Shaikh wrote:
  Hi Ryan,
 
  Thank you for response. Does it mean there is no way to intimate
  administrator about failure of fencing as of now?
 
  Let me give more information about my cluster -
 
  I have set of nodes in cluster with only IP resource being protected. I
 have
  two levels of fencing, first bladecenter fencing and second one is manual
  fencing.

 If the problem you have with fence_bladecenter is intermittent - for
 example, if it fails 1/2 the time, fence_manual is going to *detract*
 from your cluster's ability to recover automatically.

 Ordinarily, if a fencing action fails, fenced will automatically retry
 the operation.

 When you configure fence_manual as a backup, this retry will *never*
 occur, meaning your cluster hangs.


  At times if machine is already down(either power failure or turned off
  abrupty); blade center fencing timesout and manual fencing happens. At
 this
  time, administrator is expected to run fence_ack_manual.

  Clearly this is not something which is desirable, as downtime of services
 is
  as long as administrator runs fence_ack_manual.

  What is recommended method to deal with  blade center fencing failure in
  this situation? Do I have to add another level of fencing(between blade
  center and manual) which can fence automatically(not requiring manual
  interference)?

 Start with removing fence_manual.

 If fencing is failing (permanently), you can still run:

   fence_ack_manual -e -n nodename

my bladecenter fencing agent, I sometimes get message saying
 bladecenter
fencing failed because of timeout or fence device IP address/user
credentials are incorrect.

 ^^ This is why I think fence_manual is, in your specific case, very
 likely hurting your availability.

 --
 Lon Hohberger - Red Hat, Inc.

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] Two node cluster - a potential problem of node fencing each other?

2011-03-12 Thread Parvez Shaikh
Hi all,

I have a question pertaining to two node cluster, I have RHEL 5.5 and
cluster along with it which at least should have two nodes.

In a situation where both nodes of the cluster are up, and have reliable
connection to fencing device (e.g. power switch OR any other power fencing
device) and heartbeat link between two nodes goes down.

Each node finds another node is down (because heartbeat IP becomes
unreachable) and tries to fence each other.

Is this situation possible? If so, can two nodes possibly fence (in short
shutdown or reboot) each other? Is there anyway out of this situation?

Thanks
Parvez
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Two node cluster - a potential problem of node fencing each other?

2011-03-13 Thread Parvez Shaikh
redundant network link - i trust you were referring to ethernet bonding.

On Sun, Mar 13, 2011 at 1:19 PM, Ian Hayes cthulhucall...@gmail.com wrote:

 On Sat, Mar 12, 2011 at 11:19 PM, Parvez Shaikh parvez.h.sha...@gmail.com
  wrote:

 Hi all,

 I have a question pertaining to two node cluster, I have RHEL 5.5 and
 cluster along with it which at least should have two nodes.

 In a situation where both nodes of the cluster are up, and have reliable
 connection to fencing device (e.g. power switch OR any other power fencing
 device) and heartbeat link between two nodes goes down.

 Each node finds another node is down (because heartbeat IP becomes
 unreachable) and tries to fence each other.

 Is this situation possible? If so, can two nodes possibly fence (in short
 shutdown or reboot) each other? Is there anyway out of this situation?


 This is a fairly common problem called split brain. The two nodes will go
 into a shootout, fencing each other. There are a few ways to prevent this,
 such as redundant network links and the use of quorum disks.


 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] Clustat exit code for service status

2011-03-15 Thread Parvez Shaikh
Hi all,

Command clustat -s service name gives status of service.

If service is started (i.e. running on some node), exit code of this command
is 0, if however service is not running, its exit code is non-zero (found it
to be 119).

Is this right and going to be continued in subsequent cluster versions as
well? Reason I am asking this, is if I can use this command in shell script
to give status of service -

clustat -s service name
if [ $? -eq 0 ]; then
  echo service is up
else
  echo service is not up

Thanks
Parvez
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] Node without fencing method, is it possible to failover from such a node?

2011-03-16 Thread Parvez Shaikh
Hi all,

I have a red hat cluster on IBM blade center with blades being my
clusternodes and fence_bladecenter fencing agent. I have couple of resources
- IP which activate or deactivate floating IP and script which start my
server listening on this floating IP. This is a stateless server with no
shared storage requirements or any shared resources which require me to use
fancy fencing device.

Everything was working fine, when I disable ethcard of heartbeat IP or of
floating IP or pull powerplug or reboot/shutdown/halt one node, IP floats on
another node and script start my server which happily listen on this IP.
Life was good until I am now required to support cluster of nodes which are
not hosted in bladecenter but any vanilla nodes.

Now everything remains same but bladecenter fencing cant be used, and as per
my understanding since I am using red hat cluster, it requires me to use
some fence method, my first choice is to use power fencing and that only
fencing suits my application needs.

But is there any way (I know not the best and recommended but if I can live
with it) to get away with fencing and let service failover in absence of
fence devices configured for node?

Thanks,
Parvez
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Node without fencing method, is it possible to failover from such a node?

2011-03-24 Thread Parvez Shaikh
Guys, thanks a lot for your input.

I have a doubt related to IPMI fencing. In IPMI fencing, we specify network
address of IPMI controller.

This is out of band network address as well as IPMI board must have power
supply different from cluster node. Am I right?

Thanks in advance for your help.

Gratefully,
Parvez

On Thu, Mar 17, 2011 at 10:19 PM, Rajagopal Swaminathan 
raju.rajs...@gmail.com wrote:

 Greetings,

 On 3/17/11, Digimer li...@alteeve.com wrote:
  On 03/17/2011 01:25 AM, Parvez Shaikh wrote:
  Hi all,
 
  Life was good until I am now required to support cluster of
  nodes which are not hosted in bladecenter but any vanilla nodes.

 Suggestions from somebody who stupidly yapped I will support manual
 fencing and burnt his finger (Who? Oh! that was me):
 1. Don't commit support for manual fencing
 2. Don't support manual fencing.

 If you are in India, APC Fence PDU is available for around 30-35K INR
 (about a year back or so).
 If someone is ready to invest say 500K INR for HA hardware such as two
 servers etc., they might as well add 35k.

 OTOH, if those nodes are rack mounted servers (Unlike entry level
 server which does not have management port), the cost of the
 Powerfence strip will be a different issue when it comes to
 justifying, etc. within a corporate/Enterprise environment. Too much
 paperwork, I agree. But It will give a more robust infrastructure
 which will help us in using various tools like Zabbix, Spacewalk, snmp
 (I think fence strips have some SNMP - please check) etc. in the
 future.

 Life will be good then.

 With warm regards,

 Rajagopal

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] Plugged out blade from bladecenter chassis - fence_bladecenter failed

2011-04-27 Thread Parvez Shaikh
Hi all,

I am using RHCS on IBM bladecenter with blade center fencing. I plugged out
a blade from blade center chassis slot and was hoping that failover to
occur. However when I did so, I get following message -

fenced[10240]: agent fence_bladecenter reports: Failed: Unable to obtain
correct plug status or plug is not available
fenced[10240]: fence blade1 failed

Is this supported that if I plug out blade from its slot, then failover
occur without manual intervention? If so, which fencing must I use?

Thanks,
Parvez
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] oracle DB is not failing over on killin PMON deamon

2011-05-14 Thread Parvez Shaikh
Hi Sufyan

Does your status function return 0 or 1 if database is up or down
respectively (i.e. have you tested it works outside script_db.sh) when run
as root?

On Thu, May 12, 2011 at 12:52 PM, Sufyan Khan sufyan.k...@its.ws wrote:

 First of all thanks for you quick response.

 Secondly please note:  the working cluster.conf file is attached here,
 the
 previous file was not correct.
 Yes the  orainfra is the user name.

 Any othere clue please.

 sufyan






 -Original Message-
 From: linux-cluster-boun...@redhat.com
 [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Jankowski, Chris
 Sent: Thursday, May 12, 2011 9:44 AM
 To: linux clustering
 Subject: Re: [Linux-cluster] oracle DB is not failing over on killin PMON
 deamon

 Sufyan,

 What username does the instance of Oracle DB run as? Is this orainfra or
 some other username?

 The scripts assume a user named orainfra.
 If you use a different username then you need to modify the scripts
 accordingly.

 Regards,

 Chris Jankowski


 -Original Message-
 From: linux-cluster-boun...@redhat.com
 [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Sufyan Khan
 Sent: Thursday, 12 May 2011 16:27
 To: 'linux clustering'
 Subject: [Linux-cluster] oracle DB is not failing over on killin PMON
 deamon

 Dear All

 I need to setup HA cluster for mu oracle dabase.
 I have setup two node cluster using System-Config-Cluster .. on RHEL 5.5
 I
 created RG a  shared file system /emc01 ext3 , shared IP and DB script to
 monitor the DB.
 My cluster starts perfectly and fail over on shutting down primary node,
 also stopping shared IP  fails node to failover node.
 But on kill PMON , or LSNR process the node does not fails and keep showing
 the status services running on primary node.

 I JUST NEED TO KNOW WHERE IS THE PROBLEM.

 ATTACHED IS DB scripts and cluster.conf file.

 Thanks in advance for help.

 Sufyan




 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Plugged out blade from bladecenter chassis - fence_bladecenter failed

2011-06-14 Thread Parvez Shaikh
Hi,

Has anyone used missing_as_off in cluster.conf file?

Any help where to put this option in cluster.conf would be greatly
appreciated

Thanks,
Parvez

On Mon, May 2, 2011 at 6:49 PM, Parvez Shaikh parvez.h.sha...@gmail.comwrote:

 Hi Marek,

 I tried the option missing_as_off=1 and now I get an another error -

 fenced[18433]: fence node5.sscdomain failed
 fenced[18433]: fencing node node5.sscdomain

 Sniplet of cluster.conf file is -
 
 clusternode name=node5 nodeid=5 votes=1
   fence
 method name=1
   device blade=5 name=BladeCenterFencing missing_as_off=1/
 /method
   /fence
 /clusternode
   /clusternodes
 
   fencedevices
 fencedevice agent=fence_bladecenter ipaddr=blade-mm-1
 login=USERID name=BladeCenterFencing passwd=PASSW0RD/
   /fencedevices

 Did I miss something?

 Thanks
 Parvez



 On Mon, May 2, 2011 at 1:03 PM, Marek Grac mg...@redhat.com wrote:

 Hi,


 On 04/29/2011 10:15 AM, Parvez Shaikh wrote:

 Hi Marek,

 Can we give this option in cluster.conf file for bladecenter fencing
 device or method


 for cluster.conf you should add ... missing_as_off=1 ... to fence
 configuration



 For IPMI, fencing is there similar option?


 There is no such method for IPMI.


 m,

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Plugged out blade from bladecenter chassis - fence_bladecenter failed

2011-06-19 Thread Parvez Shaikh
Hi Thanks Dominic,

Do fence_bladecenter reboot the blade as a part of fencing always? I have
seen it turning the blade off by default.

Through fence_bladecenter --missing-as-off.. -o off returns me a correct
result when run from command line but fencing fails through fenced. I am
using RHEL 5.5 ES and fence_bladecenter version reports following -

fence_bladecenter -V
2.0.115 (built Tue Dec 22 10:05:55 EST 2009)
Copyright (C) Red Hat, Inc.  2004  All rights reserved.


Anyway thanks for bugzilla reference

Regards

On Sun, Jun 19, 2011 at 10:14 PM, dOminic share2...@gmail.com wrote:

 There is a bug  related to missing_as_off  -
 https://bugzilla.redhat.com/show_bug.cgi?id=689851 - expects the fix in
 rhel5u7 .

 regards,

 On Wed, Apr 27, 2011 at 1:59 PM, Parvez Shaikh 
 parvez.h.sha...@gmail.comwrote:

 Hi all,

 I am using RHCS on IBM bladecenter with blade center fencing. I plugged
 out a blade from blade center chassis slot and was hoping that failover to
 occur. However when I did so, I get following message -

 fenced[10240]: agent fence_bladecenter reports: Failed: Unable to obtain
 correct plug status or plug is not available
 fenced[10240]: fence blade1 failed

 Is this supported that if I plug out blade from its slot, then failover
 occur without manual intervention? If so, which fencing must I use?

 Thanks,
 Parvez

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster



 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] fence_ipmilan fails to reboot

2011-06-30 Thread Parvez Shaikh
Hi all,

I am on RHEL 5.5; and I have two rack mounted servers with IPMI configured.

When I run command from the prompt to reboot the server through
fence_ipmilan, it shutsdown the server fine but it fails to power it on

# fence_ipmilan -a IPMI IP Address -l admin -p password -o reboot

Rebooting machine @ IPMI:IPMI IP Address...Failed


But I can power it on or power off just fine


 # fence_ipmilan -a IPMI IP Address -l admin -p password -o on

Powering on machine @ IPMI:IPMI IP Address...Done


Due to this my fencing is failing and failover is not happening.

I have questions around this -

1. Can we provide action (off or reboot) in cluster.conf for ipmi lan
fencing?
2. Is there anything wrong in my configuration? Cluster.conf file is pasted
below
3. Is this a known issue which is fixed in newer versions

Here is how my cluster.conf looks like -

?xml version=1.0?
cluster config_version=4 name=Cluster
 fence_daemon post_fail_delay=0 post_join_delay=3/
 clusternodes
  clusternode name=blade1.domain nodeid=1 votes=1
   fence
method name=1
 device lanplus= name=IPMI_1/
/method
   /fence
  /clusternode
  clusternode name=blade2.domain nodeid=2 votes=1
   fence
method name=1
 device lanplus= name=IPMI_2/
/method
   /fence
  /clusternode
 /clusternodes
 cman expected_votes=1 two_node=1/
 fencedevices
  fencedevice agent=fence_ipmilan auth=none ipaddr=IMPI 1 IP
Address login=admin name=IPMI_1 passwd=password/
  fencedevice agent=fence_ipmilan auth=none ipaddr=IMPI 2 IP
Address login=admin name=IPMI_2 passwd=password/
 /fencedevices
 rm
  failoverdomains
   failoverdomain name=FailoveDomain ordered=1 restricted=1
failoverdomainnode name=blade1.domain priority=2/
failoverdomainnode name=blade2.domain priority=1/
   /failoverdomain
  /failoverdomains
  resources/
  service autostart=1 name=service recovery=relocate/
 /rm
/cluster

Thanks,
Parvez
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] fence_ipmilan fails to reboot - SOLVED

2011-07-01 Thread Parvez Shaikh
Hi all,

Thanks for your responses, after providing auth=password; fencing succeeded

fencedevice agent=fence_ipmilan
*auth=password*ipaddr=IP login=admin name=IPMI_1
passwd=password/

Thanks,
Parvez


On Fri, Jul 1, 2011 at 2:33 PM, שלום קלמר skle...@gmail.com wrote:

 Hi.

 I think you need to add the power_wait10  lanplus=1

 Try this line:

 fencedevice agent=fence_ipmilan power_wait=10 ipaddr=xx.xx.xx.xx
 lanplus=1 login=xxxt name=node1_ilo passwd=yyy


 Regards

 Shalom.

 On Thu, Jun 30, 2011 at 1:03 PM, Parvez Shaikh 
 parvez.h.sha...@gmail.comwrote:

 Hi all,

 I am on RHEL 5.5; and I have two rack mounted servers with IPMI
 configured.

 When I run command from the prompt to reboot the server through
 fence_ipmilan, it shutsdown the server fine but it fails to power it on

 # fence_ipmilan -a IPMI IP Address -l admin -p password -o reboot

 Rebooting machine @ IPMI:IPMI IP Address...Failed


 But I can power it on or power off just fine


 # fence_ipmilan -a IPMI IP Address -l admin -p password -o on

 Powering on machine @ IPMI:IPMI IP Address...Done


 Due to this my fencing is failing and failover is not happening.

 I have questions around this -

 1. Can we provide action (off or reboot) in cluster.conf for ipmi lan
 fencing?
 2. Is there anything wrong in my configuration? Cluster.conf file is
 pasted below
 3. Is this a known issue which is fixed in newer versions

 Here is how my cluster.conf looks like -

 ?xml version=1.0?
 cluster config_version=4 name=Cluster
  fence_daemon post_fail_delay=0 post_join_delay=3/
  clusternodes
   clusternode name=blade1.domain nodeid=1 votes=1
fence
 method name=1
  device lanplus= name=IPMI_1/
 /method
/fence
   /clusternode
   clusternode name=blade2.domain nodeid=2 votes=1
fence
 method name=1
  device lanplus= name=IPMI_2/
 /method
/fence
   /clusternode
  /clusternodes
  cman expected_votes=1 two_node=1/
  fencedevices
   fencedevice agent=fence_ipmilan auth=none ipaddr=IMPI 1 IP
 Address login=admin name=IPMI_1 passwd=password/
   fencedevice agent=fence_ipmilan auth=none ipaddr=IMPI 2 IP
 Address login=admin name=IPMI_2 passwd=password/
  /fencedevices
  rm
   failoverdomains
failoverdomain name=FailoveDomain ordered=1 restricted=1
 failoverdomainnode name=blade1.domain priority=2/
 failoverdomainnode name=blade2.domain priority=1/
/failoverdomain
   /failoverdomains
   resources/
   service autostart=1 name=service recovery=relocate/
  /rm
 /cluster

 Thanks,
 Parvez

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster



 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] Configuring failover time with Red Hat Cluster

2011-07-05 Thread Parvez Shaikh
Hi all,

I was trying to find out how much time does it take for RHCS to detect
failure and recover from it. I found the link -
http://www.redhat.com/whitepapers/rha/RHA_ClusterSuiteWPPDF.pdf

It says that network polling interval is 2 seconds and 6 retries are
attempted before declaring a node as failed. I want to know can we tune this
or configure it, say instead of 6 retries I want only 3 retries. Also
reducing network polling time from 2 seconds to say 1 second (can it be less
than 1 second, which I think would consume more CPU)?

Also I have a script resource and I see it invoked with status argument
after every 30 seconds, can we configure that as well?

Failover also involve fencing, any pointers on how can we control /
configure fencing time would also be useful,I use bladecenter fencing, IPMI
fencing as well as UCS fencing.

Thanks,
Parvez
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Configuring failover time with Red Hat Cluster

2011-07-05 Thread Parvez Shaikh
Hello Christine,

Thanks for the link enlisting various documents, I have RHC running over
RHEL 5.5 and has been working fine. However I would greatly appreciate, some
document or pointers which help me in estimate failover time or adjust it;
if that is possible.

I have been through Administration Guide and could not find how I can adjust
it.

Thanks,
Parvez

On Tue, Jul 5, 2011 at 5:58 PM, Christine Caulfield ccaul...@redhat.comwrote:

 That's a *very* old document. it's from 2003 and refers to RHEL2.1 .. which
 I sincerely hope you weren't planning to implement.

 Before you do anything more I recommend you read the documentation for the
 actual version of clustering you are going to install

 https://access.redhat.com/**knowledge/docs/Red_Hat_**Enterprise_Linux/https://access.redhat.com/knowledge/docs/Red_Hat_Enterprise_Linux/

 Chrissie


 On 05/07/11 12:32, Parvez Shaikh wrote:

 Hi all,

 I was trying to find out how much time does it take for RHCS to detect
 failure and recover from it. I found the link -
 http://www.redhat.com/**whitepapers/rha/RHA_**ClusterSuiteWPPDF.pdfhttp://www.redhat.com/whitepapers/rha/RHA_ClusterSuiteWPPDF.pdf

 It says that network polling interval is 2 seconds and 6 retries are
 attempted before declaring a node as failed. I want to know can we tune
 this or configure it, say instead of 6 retries I want only 3 retries.
 Also reducing network polling time from 2 seconds to say 1 second (can
 it be less than 1 second, which I think would consume more CPU)?

 Also I have a script resource and I see it invoked with status argument
 after every 30 seconds, can we configure that as well?

 Failover also involve fencing, any pointers on how can we control /
 configure fencing time would also be useful,I use bladecenter fencing,
 IPMI fencing as well as UCS fencing.

 Thanks,
 Parvez



 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/**mailman/listinfo/linux-clusterhttps://www.redhat.com/mailman/listinfo/linux-cluster


 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/**mailman/listinfo/linux-clusterhttps://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] $OCF_ERR_CONFIGURED - recovers service on another cluster node

2012-01-27 Thread Parvez Shaikh
Hi guys,

I am using Red Hat Cluster Suite which comes with RHEL 5.5 -

cman_tool version
6.2.0 config xxx

Now I have a script resource in which I return $OCF_ERR_CONFIGURED; in case
of a Fatal irrecoverable error, hoping that my service would not start on
another cluster node.

But I see that cluster, relocates it to another cluster node and attempts
to start it.

I referred error code documentation from
http://www.linux-ha.org/doc/dev-guides/_return_codes.html

Is there any return code which makes RHCS to give up on recovering service?

Thanks
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] $OCF_ERR_CONFIGURED - recovers service on another cluster node

2012-01-27 Thread Parvez Shaikh
Hi,

Requirement is to Fail service, not to fail over it on another node in
case of certain issues, which would be detected by my
service(automatically/programmatically) while it starts (if it doesn't find
prerequisite) it will Fail, to do so which error code should I use in my
start function ?

Thanks

On Fri, Jan 27, 2012 at 3:18 PM, emmanuel segura emi2f...@gmail.com wrote:

 The first thing you can do is stop your cluster service
 go to the node where you found the problem and using rg_test test
 /etc/cluster/cluster.conf start put_the_name_of_the_service

 like that you can see what it's wrong

 2012/1/27 Parvez Shaikh parvez.h.sha...@gmail.com

 Hi guys,

 I am using Red Hat Cluster Suite which comes with RHEL 5.5 -

 cman_tool version
 6.2.0 config xxx

 Now I have a script resource in which I return $OCF_ERR_CONFIGURED; in
 case of a Fatal irrecoverable error, hoping that my service would not start
 on another cluster node.

 But I see that cluster, relocates it to another cluster node and attempts
 to start it.

 I referred error code documentation from
 http://www.linux-ha.org/doc/dev-guides/_return_codes.html

 Is there any return code which makes RHCS to give up on recovering
 service?

 Thanks


 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster




 --
 esta es mi vida e me la vivo hasta que dios quiera

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] [TOTEM] The consensus timeout expired.

2012-03-26 Thread Parvez Shaikh
Hi all,

I have a cluster with two blades in IBM  BladeCenter. Following error is
appearing when I start cman service and it keep repeating the message
/var/log/messages -

 openais[10770]: [TOTEM] The consensus timeout expired.
 openais[10770]: [TOTEM] entering GATHER state from 3.


Heart beating IP is available on the blade and link to blade2 is also fine.
Cluster on blade2 is not running.

Services like iptables and portmap are also down.

Has anyone encountered such error and resolved it?

I am using RHEL 5.5

cman_tool version
 6.2.0 config 1




Thanks,
Parvez
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] Multicast address by CMAN

2012-04-03 Thread Parvez Shaikh
Hi all,

As per my understanding, CMAN uses cluster name to internally generate
multi-cast address. In my cluster.conf

Having a cluster with same name in a given network leads to issue and is
undesirable.

I want to know is there anyway to find if multicast address is already in
use by some other cluster, so as to avoid using name that generate same
multicast IP or for that matter configuring same multicast IP in
cluster.conf

Thanks,
Parvez
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] clurgmgrd : notice relocating a service to better node

2012-04-11 Thread Parvez Shaikh
Hi,

When I start or enable a service (that was previously disabled) on a a
cluster node, I see message saying clurmgrd relocating service to better
node.

I am not understanding why. I can relocate service back to a node where I
see above message and it runs fine there.

What does better node could mean? Better in what sense as hardware and
software configurations of both cluster nodes is same. What situation could
possibly trigger this?

Thanks,
Parvez
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] clurgmgrd : notice relocating a service to better node

2012-04-11 Thread Parvez Shaikh
Hi Digimer,


cman_tool version
6.2.0 config 3

RPM versions -

cman-2.0.115-34.el5
rgmanager-2.0.52-6.el5

I am on RHEL 5.5

The configuration is like this -

Cluster of 2 nodes. Each node is IBM Blade hosted in chassis. Private
network within chassis is used for heartbeat across cluster nodes and other
cluster service consist of IP resource and my own server which listens on
this IP resource.

cluster.conf file -

?xml version=1.0?
 cluster alias=PCluster config_version=3 name=PCluster
   clusternodes
 clusternode name=my_blade2.my_domain nodeid=2 votes=1
   fence
 method name=1
   device blade=2 missing_as_off=1 name=BladeCenterFencing/
 /method
   /fence
 /clusternode
 clusternode name=my_blade1.my_domain nodeid=1 votes=1
   fence
 method name=1
   device blade=1 missing_as_off=1 name=BladeCenterFencing/
 /method
   /fence
 /clusternode
   /clusternodes
   cman expected_votes=1 two_node=1/
   fencedevices
 fencedevice agent=fence_bladecenter ipaddr=X login=USERID
 name=BladeCenterFencing passwd=X/
   /fencedevices
   rm
 resources
   script file=/localhome/parvez/my_ha name=my_HaAgent/
   ip address=192.168.11.171 monitor_link=1/
   ip address=192.168.11.175 monitor_link=1/
   ip address=192.168.11.176 monitor_link=1/
 /resources
 failoverdomains
   failoverdomain name=my_domain nofailback=1 ordered=1
 restricted=1
 failoverdomainnode name=my_blade2.my_domain priority=2/
 failoverdomainnode name=my_blade1.my_domain priority=1/
   /failoverdomain
 /failoverdomains
 service autostart=0 domain=my_domain name=my_proc
 recovery=relocate
   script ref=my_HaAgent/
   ip ref=192.168.11.175/
 /service
   /rm
   fence_daemon clean_start=1 post_fail_delay=0 post_join_delay=0/
 /cluster


On Wed, Apr 11, 2012 at 11:51 AM, Digimer li...@alteeve.ca wrote:

 On 04/11/2012 02:14 AM, Parvez Shaikh wrote:
  Hi,
 
  When I start or enable a service (that was previously disabled) on a a
  cluster node, I see message saying clurmgrd relocating service to
  better node.
 
  I am not understanding why. I can relocate service back to a node where
  I see above message and it runs fine there.
 
  What does better node could mean? Better in what sense as hardware and
  software configurations of both cluster nodes is same. What situation
  could possibly trigger this?
 
  Thanks,
  Parvez

 What version of the cluster software are you using? What is the
 configuration? To get help, you need to share more details. :)

 --
 Digimer
 Papers and Projects: https://alteeve.com

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] How to add shell script to cluster.conf

2012-09-16 Thread Parvez Shaikh
From this link -

https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/4/html/Cluster_Administration/s1-config-service-dev-CA.html

Script
*Name* — Enter a name for the custom user script.
 *File (with path)* — Enter the path where this custom script is located
(for example, /etc/init.d/*userscript*)


On Sun, Sep 16, 2012 at 4:16 PM, Ben .T.George bentech4...@gmail.comwrote:

 Hi


 I have an NFS HA setup. how can i add a custom shell script to that
 resource group

  NFS HA services are working well..i am working with cluster.conf file
 directly..please help me to add this.i want to touch some information after
 exporting this filesystem

 Regards,
 Ben

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] 2 node cluster showing strange behaviour

2012-09-17 Thread Parvez Shaikh
Had similar issues however I was using RHEL 5.5

Please refer - https://access.redhat.com/knowledge/solutions/18542


On Mon, Sep 17, 2012 at 9:22 PM, Ben .T.George bentech4...@gmail.comwrote:



 HI

 i am just started building 2 node cluster.i installed all packages of red
 hat cluster suite by mounting RHEL 6 dvd.

 i joined cluster by using LUCI.after that my clustat showing like this:


 on node1:

 Cluster Status for eccprd @ Mon Sep 17 18:43:31 2012
 Member Status: Quorate

  Member Name ID   Status
  --   --
  cgceccprd1.combinedgroup.net1 Online, Local
  cgceccprd2.combinedgroup.net2 Offline


 on node2:

 Cluster Status for eccprd @ Mon Sep 17 18:43:31 2012
 Member Status: Quorate

  Member Name ID   Status
  --   --
  cgceccprd1.combinedgroup.net1 Offline
  cgceccprd2.combinedgroup.net2 Online, Local

 both nodes showing different status.
 i restarted many times, i deleted and created many times..then also
 same..please help me solve this

 Regards,
 Ben

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] linux-cluster

2012-10-02 Thread Parvez Shaikh
What kind of cluster is this - an academic project or production quality
solution?

If its former - go for manual fencing. You wont need fence device but
failover wont be automatic

If its later - yes you'll need fence device

On Mon, Oct 1, 2012 at 10:15 PM, Rajagopal Swaminathan 
raju.rajs...@gmail.com wrote:

 Greetings,


 Hitesh,

 Please follow list guidlines.


 On Mon, Oct 1, 2012 at 11:49 AM, Digimer li...@alteeve.ca wrote:
  You don't seem to be reading what I am typing. Please go back over the
  various replies and read again what I said. Follow the links and read
 what
  they say.
 
  And please don't reply only to me. Click Reply All and include the
 mailing
  list.
 
 
  I have a constraint of using Linux on bare machine for my 2 desktop.

   Can you please let me know as how can i
  proceed...Do i have
   to purchase
   some sort of hardware?

 Yes. You will need to buy power fencing device -- basically a power
 strip with a ethernet port

 I would strongly suggest you have two network port on each system.

 What you want to do with a cluster?

 
  One more thing...till now i have used this setup:
 
  have Windows vista OS --- Virtual BoxRed Hat installed.
 

 You have to be kidding. You are using vista on bare metal for your HA?

  If i download Xen or KVM can i use the same setup instead of
  Virtual Box?
 
  Windows vista OS Xen or KVM Red Hat installed

 http://www.youtube.com/watch?v=oKI-tD0L18A



   [root@hitesh12 ~]# cat /etc/cluster/cluster.conf
   ?xml version=1.0?
   cluster config_version=1 name=dhoni

 There needs to be two_node directive somewher there. Read up.

 Better yet get help of some local technical person who knows what is HA.

 It is a lot more than simple desktop install.

 Or you need to invest quite a bit of time in learning and money in
 getting some extra hardware (fence devices, switches, NIC, External
 storage -- if required). And dont commit for or do that on production
 without knowing what you are getting into.

 If you can post more descriptively the objective of using cluster,
 perhaps you will get more specific information.

 Digimer's _*excellent*_ tutorial covers more or less all that you need
 to know about clusters.

 I wish I had that when I started playing around with that way back in 2007.

 --
 Regards,

 Rajagopal

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] linux-cluster

2012-10-02 Thread Parvez Shaikh
Hi Digimer,

Could you please give me reference/case studies of problem about why manual
fencing was dropped and how automated fencing is fixing those?

Thanks,
Parvez

On Tue, Oct 2, 2012 at 7:08 PM, Digimer li...@alteeve.ca wrote:

 On 10/02/2012 04:00 AM, Parvez Shaikh wrote:

 What kind of cluster is this - an academic project or production quality
 solution?

 If its former - go for manual fencing. You wont need fence device but
 failover wont be automatic


 *Please* don't do this. Manual fencing support was dropped for a reason.
 It's *far* too easy to mess things up when an admin uses it before
 identifying a problem.


  If its later - yes you'll need fence device


 This is the only sane option; Academic or production. Fencing is an
 integral part of the cluster and you do yourself no favour by not learning
 it in an academic setup.


 --
 Digimer
 Papers and Projects: https://alteeve.ca
 Hydrogen is just a colourless, odourless gas which, if left alone in
 sufficient quantities for long periods of time, begins to think about
 itself.

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Hi

2012-10-02 Thread Parvez Shaikh
A curious observation, there is a sudden surge of sending emails on private
addresses rather than sending over a mailing list.

Please send your doubts / questions on mailing list 
linux-cluster@redhat.com instead of addressing personally.

Regarding configuration for manual fencing - I don't have it with me, it
was available with RHEL 5.5. Check it out in system-config-cluster tool if
you can add manual fencing.

Thanks,
Parvez

On Wed, Oct 3, 2012 at 10:46 AM, Renchu Mathew rench...@gmail.com wrote:

  Hi Purvez,

 I am trying to setup a test cluster environmet. But I haven't doen
 fencing. Please find below error messages. Some time after the nodes
 restarted, the other node is going down. can you please send me
 theconfiguration for manual fencing?


   Please find attached my cluster setup. It is not stable
  and /var/log/messages shows the below errors.
 
 
  Sep 11 08:49:10 node1 corosync[1814]:   [QUORUM] Members[2]: 1 2
  Sep 11 08:49:10 node1 corosync[1814]:   [QUORUM] Members[2]: 1 2
  Sep 11 08:49:10 node1 corosync[1814]:   [CPG   ] chosen downlist:
  sender r(0) ip(192.168.1.251) ; members(old:2 left:1)
  Sep 11 08:49:10 node1 corosync[1814]:   [MAIN  ] Completed service
  synchronization, ready to provide service.
  Sep 11 08:49:11 node1 corosync[1814]: cman killed by node 2 because we
  were killed by cman_tool or other application
  Sep 11 08:49:11 node1 fenced[1875]: telling cman to remove nodeid 2
  from cluster
  Sep 11 08:49:11 node1 fenced[1875]: cluster is down, exiting
  Sep 11 08:49:11 node1 gfs_controld[1950]: cluster is down, exiting
  Sep 11 08:49:11 node1 gfs_controld[1950]: daemon cpg_dispatch error 2
  Sep 11 08:49:11 node1 gfs_controld[1950]: cpg_dispatch error 2
  Sep 11 08:49:11 node1 dlm_controld[1889]: cluster is down, exiting
  Sep 11 08:49:11 node1 dlm_controld[1889]: daemon cpg_dispatch error 2
  Sep 11 08:49:11 node1 dlm_controld[1889]: cpg_dispatch error 2
  Sep 11 08:49:11 node1 dlm_controld[1889]: cpg_dispatch error 2
  Sep 11 08:49:11 node1 dlm_controld[1889]: cpg_dispatch error 2
  Sep 11 08:49:11 node1 fenced[1875]: daemon cpg_dispatch error 2
  Sep 11 08:49:11 node1 rgmanager[2409]: #67: Shutting down uncleanly
  Sep 11 08:49:11 node1 rgmanager[17059]: [clusterfs] unmounting /Data
  Sep 11 08:49:11 node1 rgmanager[17068]: [clusterfs] Sending SIGTERM to
  processes on /Data
  Sep 11 08:49:16 node1 rgmanager[17104]: [clusterfs] unmounting /Data
  Sep 11 08:49:16 node1 rgmanager[17113]: [clusterfs] Sending SIGKILL to
  processes on /Data
  Sep 11 08:49:19 node1 kernel: dlm: closing connection to node 2
  Sep 11 08:49:19 node1 kernel: dlm: closing connection to node 1
  Sep 11 08:49:19 node1 kernel: dlm: gfs2: no userland control daemon,
  stopping lockspace
  Sep 11 08:49:22 node1 rgmanager[17149]: [clusterfs] unmounting /Data
  Sep 11 08:49:22 node1 rgmanager[17158]: [clusterfs] Sending SIGKILL to
  processes on /Data
 
 
 
  Also when I try to restart the cman service, below error comes.
  Starting cluster:
 Checking if cluster has been disabled at boot...[  OK  ]
 Checking Network Manager... [  OK  ]
 Global setup... [  OK  ]
 Loading kernel modules...   [  OK  ]
 Mounting configfs...[  OK  ]
 Starting cman...[  OK  ]
 Waiting for quorum...   [  OK  ]
 Starting fenced...  [  OK  ]
 Starting dlm_controld...[  OK  ]
 Starting gfs_controld...[  OK  ]
 Unfencing self... fence_node: cannot connect to cman
 [FAILED]
  Stopping cluster:
 Leaving fence domain... [  OK  ]
 Stopping gfs_controld...[  OK  ]
 Stopping dlm_controld...[  OK  ]
 Stopping fenced...  [  OK  ]
 Stopping cman...[  OK  ]
 Unloading kernel modules... [  OK  ]
 Unmounting configfs...  [  OK  ]
 
  Thanks again.
  Renchu Mathew
  On Tue, Sep 11, 2012 at 9:10 PM, Arun Eapen CISSP, RHCA
  a...@redhat.com wrote:
 
 
 
  Put the fenced in debug mode and copy the error messages, for
  me to
  debug
 
  On Tue, 2012-09-11 at 11:52 +0400, Renchu Mathew wrote:
   Hi Arun,
  
   I have done the RH436 course in conducted by you at Redhat
  b'lore. How
   r u?
  
   I have configured a 2 node failover cluster setup (almost
  same like
   our RH436 lab setup in b'lore) It is almost ok except
  fencing. If I
   pull the active node 

[Linux-cluster] Not restarting max_restart times before relocating failed service

2012-10-30 Thread Parvez Shaikh
Hi experts,

I have defined a service as follows in cluster.conf -

service autostart=0 domain=mydomain exclusive=0
max_restarts=5 name=mgmt recovery=restart
script ref=myHaAgent/
ip ref=192.168.51.51/
/service

I mentioned max_restarts=5 hoping that if cluster fails to start service 5
times, then it will relocate to another cluster node in failover domain.

To check this, I turned down NIC hosting service's floating IP and got
following logs -

Oct 30 14:11:49  clurgmgrd: [10753]: warning Link for eth1: Not
detected
Oct 30 14:11:49  clurgmgrd: [10753]: warning No link on eth1...
Oct 30 14:11:49  clurgmgrd: [10753]: warning No link on eth1...
Oct 30 14:11:49  clurgmgrd[10753]: notice status on ip
192.168.51.51 returned 1 (generic error)
Oct 30 14:11:49  clurgmgrd[10753]: notice Stopping service
service:mgmt
*Oct 30 14:12:00  clurgmgrd[10753]: notice Service service:mgmt is
recovering*
Oct 30 14:12:00  clurgmgrd[10753]: notice Recovering failed service
service:mgmt
Oct 30 14:12:00  clurgmgrd[10753]: notice start on ip 192.168.51.51
returned 1 (generic error)
Oct 30 14:12:00  clurgmgrd[10753]: warning #68: Failed to start
service:mgmt; return value: 1
Oct 30 14:12:00  clurgmgrd[10753]: notice Stopping service
service:mgmt
*Oct 30 14:12:00  clurgmgrd[10753]: notice Service service:mgmt is
recovering
Oct 30 14:12:00  clurgmgrd[10753]: warning #71: Relocating failed
service service:mgmt*
Oct 30 14:12:01  clurgmgrd[10753]: notice Service service:mgmt is
stopped
Oct 30 14:12:01  clurgmgrd[10753]: notice Service service:mgmt is
stopped

But from the log it appears that cluster tried to restart service only ONCE
before relocating.

I was expecting cluster to retry starting this service five times on the
same node before relocating

Can anybody correct my understanding?

Thanks,
Parvez
-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[Linux-cluster] Monitoring Frequency - can it be changed?

2012-10-30 Thread Parvez Shaikh
Hi experts,

Can we change frequency at which resources are monitored by Cluster?

I observed 30 seconds as monitoring frequency.

Thanks,
Parvez
-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Not restarting max_restart times before relocating failed service

2012-10-30 Thread Parvez Shaikh
Hi Digimer,

cman_tool version gives following -

6.2.0 config 22

Cluster.conf -

?xml version=1.0?
cluster alias=PARVEZ config_version=22 name=PARVEZ
clusternodes
clusternode name=myblade2 nodeid=2 votes=1
fence
method name=1
device blade=2
missing_as_off=1 name=BladeCenterFencing-1/
/method
/fence
/clusternode
clusternode name=myblade1 nodeid=1 votes=1
fence
method name=1
device blade=1
missing_as_off=1 name=BladeCenterFencing-1/
/method
/fence
/clusternode
/clusternodes
cman expected_votes=1 two_node=1/
fencedevices
fencedevice agent=fence_bladecenter ipaddr=
mm-1.mydomain.com login= name=BladeCenterFencing-1 passwd=X
shell_timeout=10/
/fencedevices
rm
resources
script file=/localhome/my/my_ha
name=myHaAgent/
ip address=192.168.51.51 monitor_link=1/
/resources
failoverdomains
failoverdomain name=mydomain nofailback=1
ordered=1 restricted=1
failoverdomainnode name=myblade2
priority=2/
failoverdomainnode name=myblade1
priority=1/
/failoverdomain
/failoverdomains
service autostart=0 domain=mydomain exclusive=0
max_restarts=5 name=mgmt recovery=restart
script ref=myHaAgent/
ip ref=192.168.51.51/
/service
/rm
fence_daemon clean_start=1 post_fail_delay=0
post_join_delay=0/
/cluster

Thanks,
Parvez

On Tue, Oct 30, 2012 at 9:25 PM, Digimer li...@alteeve.ca wrote:

 On 10/30/2012 01:54 AM, Parvez Shaikh wrote:
  Hi experts,
 
  I have defined a service as follows in cluster.conf -
 
  service autostart=0 domain=mydomain exclusive=0
  max_restarts=5 name=mgmt recovery=restart
  script ref=myHaAgent/
  ip ref=192.168.51.51/
  /service
 
  I mentioned max_restarts=5 hoping that if cluster fails to start service
  5 times, then it will relocate to another cluster node in failover
 domain.
 
  To check this, I turned down NIC hosting service's floating IP and got
  following logs -
 
  Oct 30 14:11:49  clurgmgrd: [10753]: warning Link for eth1: Not
  detected
  Oct 30 14:11:49  clurgmgrd: [10753]: warning No link on eth1...
  Oct 30 14:11:49  clurgmgrd: [10753]: warning No link on eth1...
  Oct 30 14:11:49  clurgmgrd[10753]: notice status on ip
  192.168.51.51 returned 1 (generic error)
  Oct 30 14:11:49  clurgmgrd[10753]: notice Stopping service
  service:mgmt
  *Oct 30 14:12:00  clurgmgrd[10753]: notice Service service:mgmt is
  recovering*
  Oct 30 14:12:00  clurgmgrd[10753]: notice Recovering failed
  service service:mgmt
  Oct 30 14:12:00  clurgmgrd[10753]: notice start on ip
  192.168.51.51 returned 1 (generic error)
  Oct 30 14:12:00  clurgmgrd[10753]: warning #68: Failed to start
  service:mgmt; return value: 1
  Oct 30 14:12:00  clurgmgrd[10753]: notice Stopping service
  service:mgmt
  *Oct 30 14:12:00  clurgmgrd[10753]: notice Service service:mgmt is
  recovering
  Oct 30 14:12:00  clurgmgrd[10753]: warning #71: Relocating failed
  service service:mgmt*
  Oct 30 14:12:01  clurgmgrd[10753]: notice Service service:mgmt is
  stopped
  Oct 30 14:12:01  clurgmgrd[10753]: notice Service service:mgmt is
  stopped
 
  But from the log it appears that cluster tried to restart service only
  ONCE before relocating.
 
  I was expecting cluster to retry starting this service five times on the
  same node before relocating
 
  Can anybody correct my understanding?
 
  Thanks,
  Parvez

 What version? Please paste your full cluster.conf.

 --
 Digimer
 Papers and Projects: https://alteeve.ca/w/
 What if the cure for cancer is trapped in the mind of a person without
 access to education?

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Not restarting max_restart times before relocating failed service

2012-10-31 Thread Parvez Shaikh
Hi,

I am using recovery=restart as evident from earlier attached cluster.conf

Thanks,
Parvez

On Wed, Oct 31, 2012 at 2:53 PM, emmanuel segura emi2f...@gmail.com wrote:

 Hello

 Maybe you missing recovery=restart in your services

 2012/10/31 Parvez Shaikh parvez.h.sha...@gmail.com

 Hi Digimer,

 cman_tool version gives following -

 6.2.0 config 22

 Cluster.conf -

 ?xml version=1.0?
 cluster alias=PARVEZ config_version=22 name=PARVEZ
 clusternodes
 clusternode name=myblade2 nodeid=2 votes=1
 fence
 method name=1
 device blade=2
 missing_as_off=1 name=BladeCenterFencing-1/
 /method
 /fence
 /clusternode
 clusternode name=myblade1 nodeid=1 votes=1
 fence
 method name=1
 device blade=1
 missing_as_off=1 name=BladeCenterFencing-1/
 /method
 /fence
 /clusternode
 /clusternodes
 cman expected_votes=1 two_node=1/
 fencedevices
 fencedevice agent=fence_bladecenter ipaddr=
 mm-1.mydomain.com login= name=BladeCenterFencing-1
 passwd=X shell_timeout=10/
 /fencedevices
 rm
 resources
 script file=/localhome/my/my_ha
 name=myHaAgent/
 ip address=192.168.51.51 monitor_link=1/
 /resources
 failoverdomains
 failoverdomain name=mydomain nofailback=1
 ordered=1 restricted=1
 failoverdomainnode name=myblade2
 priority=2/
 failoverdomainnode name=myblade1
 priority=1/
 /failoverdomain
 /failoverdomains
 service autostart=0 domain=mydomain exclusive=0
 max_restarts=5 name=mgmt recovery=restart
 script ref=myHaAgent/
 ip ref=192.168.51.51/
 /service
 /rm
 fence_daemon clean_start=1 post_fail_delay=0
 post_join_delay=0/
 /cluster

 Thanks,
 Parvez

 On Tue, Oct 30, 2012 at 9:25 PM, Digimer li...@alteeve.ca wrote:

 On 10/30/2012 01:54 AM, Parvez Shaikh wrote:
  Hi experts,
 
  I have defined a service as follows in cluster.conf -
 
  service autostart=0 domain=mydomain exclusive=0
  max_restarts=5 name=mgmt recovery=restart
  script ref=myHaAgent/
  ip ref=192.168.51.51/
  /service
 
  I mentioned max_restarts=5 hoping that if cluster fails to start
 service
  5 times, then it will relocate to another cluster node in failover
 domain.
 
  To check this, I turned down NIC hosting service's floating IP and got
  following logs -
 
  Oct 30 14:11:49  clurgmgrd: [10753]: warning Link for eth1: Not
  detected
  Oct 30 14:11:49  clurgmgrd: [10753]: warning No link on eth1...
  Oct 30 14:11:49  clurgmgrd: [10753]: warning No link on eth1...
  Oct 30 14:11:49  clurgmgrd[10753]: notice status on ip
  192.168.51.51 returned 1 (generic error)
  Oct 30 14:11:49  clurgmgrd[10753]: notice Stopping service
  service:mgmt
  *Oct 30 14:12:00  clurgmgrd[10753]: notice Service service:mgmt
 is
  recovering*
  Oct 30 14:12:00  clurgmgrd[10753]: notice Recovering failed
  service service:mgmt
  Oct 30 14:12:00  clurgmgrd[10753]: notice start on ip
  192.168.51.51 returned 1 (generic error)
  Oct 30 14:12:00  clurgmgrd[10753]: warning #68: Failed to start
  service:mgmt; return value: 1
  Oct 30 14:12:00  clurgmgrd[10753]: notice Stopping service
  service:mgmt
  *Oct 30 14:12:00  clurgmgrd[10753]: notice Service service:mgmt
 is
  recovering
  Oct 30 14:12:00  clurgmgrd[10753]: warning #71: Relocating failed
  service service:mgmt*
  Oct 30 14:12:01  clurgmgrd[10753]: notice Service service:mgmt is
  stopped
  Oct 30 14:12:01  clurgmgrd[10753]: notice Service service:mgmt is
  stopped
 
  But from the log it appears that cluster tried to restart service only
  ONCE before relocating.
 
  I was expecting cluster to retry starting this service five times on
 the
  same node before relocating
 
  Can anybody correct my understanding?
 
  Thanks,
  Parvez

 What version? Please paste your full cluster.conf.

 --
 Digimer
 Papers and Projects: https://alteeve.ca/w/
 What if the cure for cancer is trapped in the mind of a person without
 access to education?



 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster




 --
 esta es mi vida e me la vivo hasta que dios quiera

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Linux-cluster mailing

[Linux-cluster] Normal startup vs startup due to failover on cluster node - can they be distinguished?

2012-11-22 Thread Parvez Shaikh
Hi experts,

I am using Red Hat Cluster available on RHEL 5.5. And it doesn't have any
inbuilt mechanism to generate SNMP traps in failures of resources or
failover of services from one node to another.

I have a script agent, which starts, stops and checks status of my
application. Is it possible that in a script resource - to distinguish
between normal startup of service / resource vs startup of service/resource
in response to failover / failure handling? Doing so would help me write
code to generate alarms if startup of service / resource (in my case a
process) is due to failover (not normal startup).

Further is it possible to get information such as cause of failure(leading
to failover), and previous cluster node on which service / resource was
running(prior to failover)?

This would help to provide as much information as possible in traps

Thanks,
Parvez
-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Normal startup vs startup due to failover on cluster node - can they be distinguished?

2012-11-27 Thread Parvez Shaikh
Kind reminder on this.

Any inputs would be of great help. Basically I intend to have SNMP traps
generated to notify failures and failover while using RHCS.

Thanks,
Parvez

On Fri, Nov 23, 2012 at 2:54 PM, satya suresh kolapalli 
kolapallisatya...@gmail.com wrote:

 Hi,

 send the script which you have



 On 23 November 2012 10:55, Parvez Shaikh parvez.h.sha...@gmail.com
 wrote:
  Hi experts,
 
  I am using Red Hat Cluster available on RHEL 5.5. And it doesn't have any
  inbuilt mechanism to generate SNMP traps in failures of resources or
  failover of services from one node to another.
 
  I have a script agent, which starts, stops and checks status of my
  application. Is it possible that in a script resource - to distinguish
  between normal startup of service / resource vs startup of
 service/resource
  in response to failover / failure handling? Doing so would help me write
  code to generate alarms if startup of service / resource (in my case a
  process) is due to failover (not normal startup).
 
  Further is it possible to get information such as cause of
 failure(leading
  to failover), and previous cluster node on which service / resource was
  running(prior to failover)?
 
  This would help to provide as much information as possible in traps
 
  Thanks,
  Parvez
 
  --
  Linux-cluster mailing list
  Linux-cluster@redhat.com
  https://www.redhat.com/mailman/listinfo/linux-cluster



 --
 Regards,
 SatyaSuresh Kolapalli
 Mob: 7702430892

 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster