Re: [Pacemaker] APC Master Stonith

2010-01-19 Thread Dominik Klein
Errol Neal wrote:
> On Tue, Jan 19, 2010 04:19  PM, Sander van Vugt  wrote:
>> Hi,
>>
>> I hope someone has configured the APC Master Stonith resource (which you
>> would use to have pacemaker to a device like the APC switched rack PDU),
>> as I have a - probably extremely stupid - conceptual question about it. 
>>
>> When I look at the options the resource has, it allows me to enter
>> username, password and IP address. What I would also expect, is to give
>> it something like a name of the node that is should do STONITH on, as
>> well as the port on the device that it should power cycle. Am I missing
>> something? Or do I have to specify this information as additional
>> attributes? And if so, what exactly would be the syntax?
>>
> What type of device are you trying to get the plugin to work with?
> I'm using APC rack PDUs and this plugin did not work by default for me. I had 
> to hack it to get it work for me, but
> it works exactly how I wanted it to. By the way, I'm not using the snmp - i'm 
> using telnet.
> 
> So here is how mine's is configured:
> 
> primitive stonith-apcmaster-axigen2 stonith:apcmaster \
> params ipaddr="x.x.x.x login="axigen2" password="x.x.x.x" \
> op monitor interval="120s" timeout="20s" \
> op startup interval="0" timeout="60s" \
> 
> Then I have a constraint that prohibits it a node from committing suicide.
> 
> I'll describe what I did to get it going in my environment. 
> 
> I created a user account for each node on it's respective PDU and only 
> allowed it to control it's own power.
> 
> As I mentioned, I hacked the plugin's source code and recompiled. My changes  
> #1 to make it work and #2, to make it work simple and plain. Login and 
> shut-er down. I can provide you my changes if you think it will work for you.

I'd also be interested in the changes.

The apcmastersnmp plugin does work for me with APC7920 though.

Thanks,
Dominik

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] APC Master Stonith

2010-01-19 Thread E-Blokos
- Original Message - 
From: "Errol Neal" 

To: ; 
Cc: "pacemaker" 
Sent: Tuesday, January 19, 2010 4:37 PM
Subject: Re: [Pacemaker] APC Master Stonith


On Tue, Jan 19, 2010 04:19  PM, Sander van Vugt  
wrote:

Hi,

I hope someone has configured the APC Master Stonith resource (which you
would use to have pacemaker to a device like the APC switched rack PDU),
as I have a - probably extremely stupid - conceptual question about it.

When I look at the options the resource has, it allows me to enter
username, password and IP address. What I would also expect, is to give
it something like a name of the node that is should do STONITH on, as
well as the port on the device that it should power cycle. Am I missing
something? Or do I have to specify this information as additional
attributes? And if so, what exactly would be the syntax?


What type of device are you trying to get the plugin to work with?
I'm using APC rack PDUs and this plugin did not work by default for me. I 
had to hack it to get it work for me, but
it works exactly how I wanted it to. By the way, I'm not using the snmp - 
i'm using telnet.


So here is how mine's is configured:

primitive stonith-apcmaster-axigen2 stonith:apcmaster \
   params ipaddr="x.x.x.x login="axigen2" password="x.x.x.x" \
   op monitor interval="120s" timeout="20s" \
   op startup interval="0" timeout="60s" \

Then I have a constraint that prohibits it a node from committing suicide.

I'll describe what I did to get it going in my environment.

I created a user account for each node on it's respective PDU and only 
allowed it to control it's own power.


As I mentioned, I hacked the plugin's source code and recompiled. My 
changes  #1 to make it work and #2, to make it work simple and plain. 
Login and shut-er down. I can provide you my changes if you think it will 
work for you.


-Errol

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker



Hi Errol,

as I just bought a used AP7900 APC so if you can provide the hacked code it 
would be cool.


Thanks

Franck 



___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Announce: Pacemaker 1.0.7 (stable) Released

2010-01-19 Thread Thomas Guthmann

Hey,


Pre-built packages for Pacemaker and it s immediate dependancies are currently 
building and will be available for openSUSE, SLES, Fedora, RHEL, CentOS from 
the ClusterLabs Build Area (http://www.clusterlabs.org/rpm) shortly.
Thanks Andrew. I'll move from 1.0.6 + patch to 1.0.7 at the end of week 
and I will give you feedback if I find anything weird.


Is it also possible to have corosync 1.2.0 in the repository ?

Thomas

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] APC Master Stonith

2010-01-19 Thread Errol Neal
On Tue, Jan 19, 2010 04:19  PM, Sander van Vugt  wrote:
> Hi,
> 
> I hope someone has configured the APC Master Stonith resource (which you
> would use to have pacemaker to a device like the APC switched rack PDU),
> as I have a - probably extremely stupid - conceptual question about it. 
> 
> When I look at the options the resource has, it allows me to enter
> username, password and IP address. What I would also expect, is to give
> it something like a name of the node that is should do STONITH on, as
> well as the port on the device that it should power cycle. Am I missing
> something? Or do I have to specify this information as additional
> attributes? And if so, what exactly would be the syntax?
> 
What type of device are you trying to get the plugin to work with?
I'm using APC rack PDUs and this plugin did not work by default for me. I had 
to hack it to get it work for me, but
it works exactly how I wanted it to. By the way, I'm not using the snmp - i'm 
using telnet.

So here is how mine's is configured:

primitive stonith-apcmaster-axigen2 stonith:apcmaster \
params ipaddr="x.x.x.x login="axigen2" password="x.x.x.x" \
op monitor interval="120s" timeout="20s" \
op startup interval="0" timeout="60s" \

Then I have a constraint that prohibits it a node from committing suicide.

I'll describe what I did to get it going in my environment. 

I created a user account for each node on it's respective PDU and only allowed 
it to control it's own power.

As I mentioned, I hacked the plugin's source code and recompiled. My changes  
#1 to make it work and #2, to make it work simple and plain. Login and shut-er 
down. I can provide you my changes if you think it will work for you.

-Errol

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] APC Master Stonith

2010-01-19 Thread Sander van Vugt
Hi,

I hope someone has configured the APC Master Stonith resource (which you
would use to have pacemaker to a device like the APC switched rack PDU),
as I have a - probably extremely stupid - conceptual question about it. 

When I look at the options the resource has, it allows me to enter
username, password and IP address. What I would also expect, is to give
it something like a name of the node that is should do STONITH on, as
well as the port on the device that it should power cycle. Am I missing
something? Or do I have to specify this information as additional
attributes? And if so, what exactly would be the syntax?

Thanks for enlightening me,
Sander



___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Question on resource groups

2010-01-19 Thread Dejan Muhamedagic
Hi,

On Tue, Jan 19, 2010 at 11:38:06AM -0500, Ken Dechick wrote:
> Hello all,
> 
> Quick question here today. Please forgive me if this has been
> answered, I have searched for a couple days and not been able
> to come up with the answer. I am working on a standard 2 node
> cluster using DRBD and I have my resources in a group. All in
> working well, but my question has to do with what happens when
> there is a problem with an individual service. Consider the
> following example using heartbeat (3.0.1-1) drbd (8.3.6) and
> pacemaker (1.0.6):
> 
> Cluster with one reosurce group which contains these resources in this order: 
>    
>    -drbd master/slave
>    -virtual file system
>    -openvpn
>    -samba
>    -apache webserver
>    -cupsd
> 
> Problem I am running into is if there is a problem with openvpn
> in this example (VPN goes down and keys are missing so it
> CANNOT restart without intervention), watching the cluster with
> crm_mon, I see that all the services under openvpn in order
> (samba,apache, cupsd) will all starta "rolling restart". In
> other words, I see openvpn fail, then samba goes down, then
> apache goes down, then cups goes down. Next cups comes up,
> apache comes up, samba comes up, then openvpn tries to start
> but fails so the progress starts over - smba, apache and cups
> stop then start again. What I end up with is a system where
> those last 3 services which runs fine alone keep coming up then
> going down again, over and over. Only way I can change this is
> to fix the openvpn issue, then things restart and stay
> restarted.
> 
> My question is: is this normal (expected) behavior?

Yes.

> If so how
> do I change this?

Reconfigure. Your group doesn't represent properly the relations
between resources. I guess that all the four resources depend on
drbd and filesystem, but not on each other. You can then create
non-ordered group with those four resources and collocate/order
that group with the drbd/fs group.

Thanks,

Dejan

> I have tried several on-fail options in the
> monitors for those services (tried: stop, restart, and block)
> but this doesn't change the behavior. I would like to just have
> the one service stop without affecting the others. Do I need to
> re-think using a resource group?? Any assistance would be
> greatly appreciated. The pacemaker site has a lot of
> documentation but it's not the clearest explainations at times.
> 
> -Thanks
> 
> Kenneth M DeChick
> Linux Systems Administrator
> Community Computer Service, Inc.
> (315)-255-1751 ext154
> http://www.medent.com
> k...@medent.com
> Registered Linux User #497318
> -- -- -- -- -- -- -- -- -- -- --
> "You canna change the laws of physics, Captain; I've got to have 
> thirtyminutes! "
> 
> .
>  
> This message has been scanned for viruses and dangerous content by 
> MailScanner, SpamAssassin  & ClamAV. 
>  
> This message and any attachments may contain information that is protected by 
> law as privileged and confidential, and is transmitted for the sole use 
> of the intended recipient(s). If you are not the intended recipient, you are 
> hereby notified that any use, dissemination, copying or retention of this 
> e-mail 
> or the information contained herein is strictly prohibited. If you received 
> this e-mail in error, please immediately notify the sender by e-mail, and 
> permanently 
> delete this e-mail. 
> 

> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] crm_attribte failed with Multiple attributes match

2010-01-19 Thread hj lee
On Tue, Jan 19, 2010 at 9:08 AM, Andrew Beekhof  wrote:

> I ran your script, all I get is:
>
> [r...@pcmk-4 ~]# ./test.sh 0
> scope=status  name=master-vmrd-res:0 value=(null)
> Error performing operation: The object/attribute does not exist
> scope=status  name=master-vmrd-res:0 value=1
> scope=status  name=master-vmrd-res:0 value=2
> scope=status  name=master-vmrd-res:0 value=3
> scope=status  name=master-vmrd-res:0 value=4
> scope=status  name=master-vmrd-res:0 value=5
> scope=status  name=master-vmrd-res:0 value=6
> scope=status  name=master-vmrd-res:0 value=7
>
> which is pretty much what I'd expect.
> Perhaps try a more recent version?
>
>

Which version are you running? How long did you run my script? In my CentOS
5.3 with pacemaker-0.80.5, it happens with a few minutes. I found that the
libxml2 returns multiple entries for XPath query. I think the problem is in
the default libxml2 installed on CentOS 5.3, I upgraded it to libxml2-2.7.3,
then this problem went away. I will test it with rpms in epel-5.

By the way if pacemaker-0.80.5 is outdated version, please remove it from
repository.

Thanks very much any way.
hj
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] Question on resource groups

2010-01-19 Thread Ken Dechick
Hello all,

Quick question here today. Please forgive me if this has been answered, I have 
searched for a couple days and not been able to come up with the answer. I am 
working on a standard 2 node cluster using DRBD and I have my resources in a 
group. All in working well, but my question has to do with what happens when 
there is a problem with an individual service. Consider the following example 
using heartbeat (3.0.1-1) drbd (8.3.6) and pacemaker (1.0.6):

Cluster with one reosurce group which contains these resources in this order: 
   
   -drbd master/slave
   -virtual file system
   -openvpn
   -samba
   -apache webserver
   -cupsd

Problem I am running into is if there is a problem with openvpn in this example 
(VPN goes down and keys are missing so it CANNOT restart without intervention), 
watching the cluster with crm_mon, I see that all the services under openvpn in 
order (samba,apache, cupsd) will all starta "rolling restart". In other words, 
I see openvpn fail, then samba goes down, then apache goes down, then cups goes 
down. Next cups comes up, apache comes up, samba comes up, then openvpn tries 
to start but fails so the progress starts over - smba, apache and cups stop 
then start again. What I end up with is a system where those last 3 services 
which runs fine alone keep coming up then going down again, over and over. Only 
way I can change this is to fix the openvpn issue, then things restart and stay 
restarted.

My question is: is this normal (expected) behavior? If so how do I change this? 
I have tried several on-fail options in the monitors for those services (tried: 
stop, restart, and block) but this doesn't change the behavior. I would like to 
just have the one service stop without affecting the others. Do I need to 
re-think using a resource group?? Any assistance would be greatly appreciated. 
The pacemaker site has a lot of documentation but it's not the clearest 
explainations at times.

-Thanks

Kenneth M DeChick
Linux Systems Administrator
Community Computer Service, Inc.
(315)-255-1751 ext154
http://www.medent.com
k...@medent.com
Registered Linux User #497318
-- -- -- -- -- -- -- -- -- -- --
"You canna change the laws of physics, Captain; I've got to have thirtyminutes! 
"

.
 
This message has been scanned for viruses and dangerous content by MailScanner, 
SpamAssassin  & ClamAV. 
 
This message and any attachments may contain information that is protected by 
law as privileged and confidential, and is transmitted for the sole use 
of the intended recipient(s). If you are not the intended recipient, you are 
hereby notified that any use, dissemination, copying or retention of this 
e-mail 
or the information contained herein is strictly prohibited. If you received 
this e-mail in error, please immediately notify the sender by e-mail, and 
permanently 
delete this e-mail. 

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] errors in corosync.log

2010-01-19 Thread Andrew Beekhof
On Tue, Jan 19, 2010 at 4:59 PM, Shravan Mishra
 wrote:
> cibadmin 1.0.5 for OpenAIS and Heartbeat (Build:
> 9e9faaab40f3f97e3c0d623e4a4c47ed83fa1601)

That version is too old to support corosync.
I'm surprised it even compiles.

You need at least Pacemaker 1.0.6 and Corosync 1.1.2
   http://theclusterguy.clusterlabs.org/post/230672127/pacemaker-1-0-6-released

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] crm_attribte failed with Multiple attributes match

2010-01-19 Thread Andrew Beekhof
I ran your script, all I get is:

[r...@pcmk-4 ~]# ./test.sh 0
scope=status  name=master-vmrd-res:0 value=(null)
Error performing operation: The object/attribute does not exist
scope=status  name=master-vmrd-res:0 value=1
scope=status  name=master-vmrd-res:0 value=2
scope=status  name=master-vmrd-res:0 value=3
scope=status  name=master-vmrd-res:0 value=4
scope=status  name=master-vmrd-res:0 value=5
scope=status  name=master-vmrd-res:0 value=6
scope=status  name=master-vmrd-res:0 value=7

which is pretty much what I'd expect.
Perhaps try a more recent version?

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] errors in corosync.log

2010-01-19 Thread Shravan Mishra
Corosync Cluster Engine, version '1.1.1' SVN revision '2534'
Copyright (c) 2006-2009 Red Hat, Inc.

Shravan


On Tue, Jan 19, 2010 at 10:59 AM, Shravan Mishra
 wrote:
> cibadmin 1.0.5 for OpenAIS and Heartbeat (Build:
> 9e9faaab40f3f97e3c0d623e4a4c47ed83fa1601)
>
> -Shravan
>
> On Tue, Jan 19, 2010 at 8:29 AM, Andrew Beekhof  wrote:
>> On Sat, Jan 16, 2010 at 9:20 PM, Shravan Mishra
>>  wrote:
>>> Hi Guys,
>>>
>>> I'm running the following version of pacemaker and corosync
>>> corosync=1.1.1-1-2
>>> pacemaker=1.0.9-2-1
>>
>> That pacemaker version doesn't exist.
>> What does cibadmin --version say?
>>
>> And are you sure about the corosync version, it doesn't look right either.
>>
>>>
>>> Every thing had been running fine for quite some time now but then I
>>> started seeing following errors in the corosync logs,
>>>
>>>
>>> =
>>> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
>>> digest... ignoring.
>>> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
>>> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
>>> digest... ignoring.
>>> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
>>> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
>>> digest... ignoring.
>>> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
>>> 
>>>
>>> I can perform all the crm shell commands and what not but it's
>>> troubling that the above is happening.
>>>
>>> My crm_mon output looks good.
>>>
>>>
>>> I also checked the authkey and did md5sum on both it's same.
>>>
>>> Then I stopped corosync and regenerated the authkey with
>>> corosync-keygen and copied it to the the other machine but I still get
>>> the above message in the corosync log.
>>>
>>> Is there anything other authkey that I should look into ?
>>>
>>>
>>> corosync.conf
>>>
>>> 
>>>
>>> # Please read the corosync.conf.5 manual page
>>> compatibility: whitetank
>>>
>>> totem {
>>>        version: 2
>>>        token: 3000
>>>        token_retransmits_before_loss_const: 10
>>>        join: 60
>>>        consensus: 1500
>>>        vsftype: none
>>>        max_messages: 20
>>>        clear_node_high_bit: yes
>>>        secauth: on
>>>        threads: 0
>>>        rrp_mode: passive
>>>
>>>        interface {
>>>                ringnumber: 0
>>>                bindnetaddr: 192.168.2.0
>>>                #mcastaddr: 226.94.1.1
>>>                broadcast: yes
>>>                mcastport: 5405
>>>        }
>>>        interface {
>>>                ringnumber: 1
>>>                bindnetaddr: 172.20.20.0
>>>                #mcastaddr: 226.94.1.1
>>>                broadcast: yes
>>>                mcastport: 5405
>>>        }
>>> }
>>>
>>>
>>> logging {
>>>        fileline: off
>>>        to_stderr: yes
>>>        to_logfile: yes
>>>        to_syslog: yes
>>>        logfile: /tmp/corosync.log
>>>        debug: off
>>>        timestamp: on
>>>        logger_subsys {
>>>                subsys: AMF
>>>                debug: off
>>>        }
>>> }
>>>
>>> service {
>>>        name: pacemaker
>>>        ver: 0
>>> }
>>>
>>> aisexec {
>>>        user:root
>>>        group: root
>>> }
>>>
>>> amf {
>>>        mode: disabled
>>> }
>>>
>>>
>>> ===
>>>
>>>
>>> Thanks
>>> Shravan
>>>
>>> ___
>>> Pacemaker mailing list
>>> Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>
>> ___
>> Pacemaker mailing list
>> Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] errors in corosync.log

2010-01-19 Thread Shravan Mishra
cibadmin 1.0.5 for OpenAIS and Heartbeat (Build:
9e9faaab40f3f97e3c0d623e4a4c47ed83fa1601)

-Shravan

On Tue, Jan 19, 2010 at 8:29 AM, Andrew Beekhof  wrote:
> On Sat, Jan 16, 2010 at 9:20 PM, Shravan Mishra
>  wrote:
>> Hi Guys,
>>
>> I'm running the following version of pacemaker and corosync
>> corosync=1.1.1-1-2
>> pacemaker=1.0.9-2-1
>
> That pacemaker version doesn't exist.
> What does cibadmin --version say?
>
> And are you sure about the corosync version, it doesn't look right either.
>
>>
>> Every thing had been running fine for quite some time now but then I
>> started seeing following errors in the corosync logs,
>>
>>
>> =
>> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
>> digest... ignoring.
>> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
>> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
>> digest... ignoring.
>> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
>> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
>> digest... ignoring.
>> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
>> 
>>
>> I can perform all the crm shell commands and what not but it's
>> troubling that the above is happening.
>>
>> My crm_mon output looks good.
>>
>>
>> I also checked the authkey and did md5sum on both it's same.
>>
>> Then I stopped corosync and regenerated the authkey with
>> corosync-keygen and copied it to the the other machine but I still get
>> the above message in the corosync log.
>>
>> Is there anything other authkey that I should look into ?
>>
>>
>> corosync.conf
>>
>> 
>>
>> # Please read the corosync.conf.5 manual page
>> compatibility: whitetank
>>
>> totem {
>>        version: 2
>>        token: 3000
>>        token_retransmits_before_loss_const: 10
>>        join: 60
>>        consensus: 1500
>>        vsftype: none
>>        max_messages: 20
>>        clear_node_high_bit: yes
>>        secauth: on
>>        threads: 0
>>        rrp_mode: passive
>>
>>        interface {
>>                ringnumber: 0
>>                bindnetaddr: 192.168.2.0
>>                #mcastaddr: 226.94.1.1
>>                broadcast: yes
>>                mcastport: 5405
>>        }
>>        interface {
>>                ringnumber: 1
>>                bindnetaddr: 172.20.20.0
>>                #mcastaddr: 226.94.1.1
>>                broadcast: yes
>>                mcastport: 5405
>>        }
>> }
>>
>>
>> logging {
>>        fileline: off
>>        to_stderr: yes
>>        to_logfile: yes
>>        to_syslog: yes
>>        logfile: /tmp/corosync.log
>>        debug: off
>>        timestamp: on
>>        logger_subsys {
>>                subsys: AMF
>>                debug: off
>>        }
>> }
>>
>> service {
>>        name: pacemaker
>>        ver: 0
>> }
>>
>> aisexec {
>>        user:root
>>        group: root
>> }
>>
>> amf {
>>        mode: disabled
>> }
>>
>>
>> ===
>>
>>
>> Thanks
>> Shravan
>>
>> ___
>> Pacemaker mailing list
>> Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] crm_attribte failed with Multiple attributes match

2010-01-19 Thread Andrew Beekhof
On Tue, Jan 12, 2010 at 10:15 PM, hj lee  wrote:
>
> Is cib code a thread-safe?

Probably not. But there are no threads in the Pacemaker code so it
doesn't really matter.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] [Openais] Problem with cluster linux HA

2010-01-19 Thread Andrew Beekhof
On Mon, Jan 18, 2010 at 2:46 PM, Galera, Daniel  wrote:
> Hell all,
>
> I have 2 Suse Linux Enterprise 11 Servers with High Av. Extension. I'm
> configuring a cluster with 2 nodes for the cluster and only 1 group to run.
> I use SBD as STONITH. I set the cluster correctly without problems.  Now i
> want to have an application clustered named HPOS. for that i need to have in
> the group: SFEX --:> to lock the drive LVM --> to activate the VG Filesystem
> --> to mount the 3 filesystems needed IP --> to bring online IP of the
> cluster and then two LSB to run the 2 processes of application HPOS. anyway,
> the application is not the problem. The problem is that when i want to test
> cluster and for example MOVE resource to the other node (server1)... the
> group becomes down and server2 appears as offline with Stonith UNCLEAN.

Usually its when a resource fails to stop.
Please use hb_report to generate a tarball and indicate which node you
tried to move the resource to (and how)

> that
> info checking from server1 if at that moment i check crm_mon from server2, i
> see server2 as online but server1 down. No idea what the problem is.
>
> Attached you the cluster XML config file.
>
> Attached the log files of 2 nodes when i executed the MOVE RESOURCE that
> failed.
>
> am i missing any resource location or any other expected thing?
>
> do you have any cluster example so i can configure correctluy mine?
>
> regards
>
> Dani
>
> ___
> Openais mailing list
> open...@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais
>

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] errors in corosync.log

2010-01-19 Thread Andrew Beekhof
On Sat, Jan 16, 2010 at 9:20 PM, Shravan Mishra
 wrote:
> Hi Guys,
>
> I'm running the following version of pacemaker and corosync
> corosync=1.1.1-1-2
> pacemaker=1.0.9-2-1

That pacemaker version doesn't exist.
What does cibadmin --version say?

And are you sure about the corosync version, it doesn't look right either.

>
> Every thing had been running fine for quite some time now but then I
> started seeing following errors in the corosync logs,
>
>
> =
> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
> digest... ignoring.
> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
> digest... ignoring.
> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
> Jan 16 15:08:39 corosync [TOTEM ] Received message has invalid
> digest... ignoring.
> Jan 16 15:08:39 corosync [TOTEM ] Invalid packet data
> 
>
> I can perform all the crm shell commands and what not but it's
> troubling that the above is happening.
>
> My crm_mon output looks good.
>
>
> I also checked the authkey and did md5sum on both it's same.
>
> Then I stopped corosync and regenerated the authkey with
> corosync-keygen and copied it to the the other machine but I still get
> the above message in the corosync log.
>
> Is there anything other authkey that I should look into ?
>
>
> corosync.conf
>
> 
>
> # Please read the corosync.conf.5 manual page
> compatibility: whitetank
>
> totem {
>        version: 2
>        token: 3000
>        token_retransmits_before_loss_const: 10
>        join: 60
>        consensus: 1500
>        vsftype: none
>        max_messages: 20
>        clear_node_high_bit: yes
>        secauth: on
>        threads: 0
>        rrp_mode: passive
>
>        interface {
>                ringnumber: 0
>                bindnetaddr: 192.168.2.0
>                #mcastaddr: 226.94.1.1
>                broadcast: yes
>                mcastport: 5405
>        }
>        interface {
>                ringnumber: 1
>                bindnetaddr: 172.20.20.0
>                #mcastaddr: 226.94.1.1
>                broadcast: yes
>                mcastport: 5405
>        }
> }
>
>
> logging {
>        fileline: off
>        to_stderr: yes
>        to_logfile: yes
>        to_syslog: yes
>        logfile: /tmp/corosync.log
>        debug: off
>        timestamp: on
>        logger_subsys {
>                subsys: AMF
>                debug: off
>        }
> }
>
> service {
>        name: pacemaker
>        ver: 0
> }
>
> aisexec {
>        user:root
>        group: root
> }
>
> amf {
>        mode: disabled
> }
>
>
> ===
>
>
> Thanks
> Shravan
>
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] 1.0.7 upgraded, restarting resources problem

2010-01-19 Thread Dejan Muhamedagic
Hi,

On Mon, Jan 18, 2010 at 10:00:39PM +0100, Martin Gombač wrote:
> Hi,
> 
> i have one m/s drbd resource and one Xen instance on top. Both m/s
> are primary.
> When i restart node that's _not_ hosting the Xen instance (ibm1),
> pacemaker restarts running Xen instance on the other node (ibm2).
> There is no need to do that. I thought it got fixed
> (http://developerbugs.linux-foundation.org/show_bug.cgi?id=2153).
> Didn't it?
> 
> Here is my config once more. Please note the WARNING showed up only
> after upgrade.
> (BTW setting drbd0predHosting score to 0 doesn't restart it. But it
> doesn't help resource ordering either.)
> 
> [r...@ibm1 etc]# crm configure show
> WARNING: notify: operation name not recognized

That's from the shell, please ignore it. Strange, the operation
list should've been updated a long time ago.

Thanks,

Dejan

> node $id="3d430f49-b915-4d52-a32b-b0799fa17ae7" ibm2
> node $id="4b2047c8-f3a0-4935-84a2-967b548598c9" ibm1
> primitive Hosting ocf:heartbeat:Xen \
>params xmfile="/etc/xen/Hosting.cfg" shutdown_timeout="303" \
>meta target-role="Started" allow-migrate="true" is-managed="true" \
>op monitor interval="120s" timeout="506s" start-delay="5s" \
>op migrate_to interval="0s" timeout="304s" \
>op migrate_from interval="0s" timeout="304s" \
>op stop interval="0s" timeout="304s" \
>op start interval="0s" timeout="202s"
> primitive drbd_r0 ocf:linbit:drbd \
>params drbd_resource="r0" \
>op monitor interval="15s" role="Master" timeout="30s" \
>op monitor interval="30s" role="Slave" timeout="30s" \
>op stop interval="0s" timeout="501s" \
>op notify interval="0s" timeout="90s" \
>op demote interval="0s" timeout="90s" \
>op promote interval="0s" timeout="90s" \
>op start interval="0s" timeout="255s"
> ms ms_drbd_r0 drbd_r0 \
>meta notify="true" master-max="2" inteleave="true"
> is-managed="true" target-role="Started"
> order drbd0predHosting inf: ms_drbd_r0:promote Hosting:start
> property $id="cib-bootstrap-options" \
>dc-version="1.0.7-b1191b11d4b56dcae8f34715d52532561b875cd5" \
>cluster-infrastructure="Heartbeat" \
>stonith-enabled="false" \
>no-quorum-policy="ignore" \
>default-resource-stickiness="10" \
>last-lrm-refresh="1263845352"
> 
> All i want is to have just one resource Hosting started, after drbd
> was promoted(/primary) on the node that's it's starting.
> Please advise me if you can.
> 
> Thank you,
> regards,
> M.
> 
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] [GUI][PATCH]An orphan resource was displayed by GUI.

2010-01-19 Thread Yan Gao
Hi,

renayama19661...@ybb.ne.jp wrote:
> Hi,
> 
> An orphan resource was displayed by GUI.
> There is the displayed orphan resource nowhere.
> 
> Step1) I start corosync service.
> 
> Step2) I spend cib.xml by a cibadmin command.
> 
> Step3) I confirm a resource in GUI.
> 
> The orphan resource of the clone is displayed by GUI.
> However, there cannot be this orphan resource (clnUmResource:2).
> 
> I attached hard copy of GUI and a result of hb_report.
> 
> And I attached the patch of the revision again. 
> However, this patch may be unfinished.
> 
> Please confirm a patch, and please solve this problem.
Thanks for reporting and the patch!
Could you please try the attached patch?

Regards,
  Yan

-- 
Yan Gao 
Software Engineer
China Server Team, OPS Engineering, Novell, Inc.
diff -r 8b56e4d03dc8 mgmt/daemon/mgmt_crm.c
--- a/mgmt/daemon/mgmt_crm.c	Tue Jan 12 13:25:09 2010 +0800
+++ b/mgmt/daemon/mgmt_crm.c	Tue Jan 19 17:11:53 2010 +0800
@@ -1475,7 +1475,10 @@
 	ret = strdup(MSG_OK);
 	while (cur != NULL) {
 		resource_t* rsc = (resource_t*)cur->data;
-		ret = mgmt_msg_append(ret, rsc->id);
+		gboolean is_active = rsc->fns->active(rsc, FALSE);
+		if (is_not_set(rsc->flags, pe_rsc_orphan) || is_active) {
+			ret = mgmt_msg_append(ret, rsc->id);
+		}
 		cur = g_list_next(cur);
 	}
 	free_data_set(data_set);
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker