Re: [Pacemaker] Listing all resources of a specific type

2013-02-20 Thread Dan Frincu
Hi,

On Tue, Feb 19, 2013 at 10:42 PM, Donald Stahl  wrote:
> Is there some way of listing all resources that use a specific
> resource agent using crm shell?
>
> For example:
>
> # crm resource list
>  stonith-sbd(stonith:external/sbd) Started
>  IP1 (ocf::heartbeat:IPaddr2) Started
>  IP2 (ocf::heartbeat:IPaddr2) Started
>  FS1 (ocf::heartbeat:Filesystem) Started
>  FS2 (ocf::heartbeat:Filesystem) Started
>
> I'd like to be able to filter by the resource agents- for example the
> ocf::heartbeat:Filesystem agent.
>
> Much like:
> # crm resource list | grep ocf::heartbeat:Filesystem
>  FS1 (ocf::heartbeat:Filesystem) Started
>  FS2 (ocf::heartbeat:Filesystem) Started
>
> Obviously I can use grep but I'd love to know if there were a native
> way of doing this.

There's crm ra list class:provider:type but I guess you want to find
RA's that are actively used in the configuration. In this case, the
command returns all RA's that match from the ones installed on the
system.

>
> Thanks,
> -Don
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Online add a new node to cluster communicating by UDPU

2013-02-12 Thread Dan Frincu
On Tue, Feb 12, 2013 at 1:28 PM, Vladislav Bogdanov
 wrote:
> 12.02.2013 14:11, Viacheslav Biriukov wrote:
>> Why don't you use it? Do you know any issues with this method?
>
> I just did not need it yet.
>
> And, one also needs to check that it is possible to cleanly delete nodes

Since the hostnames don't change, there shouldn't be a requirement to
delete the node. If the IP's are dynamically allocated, but stay the
same, and you don't hit bugs such as the dnsmasq one mentioned
earlier, then the DHCP IP renewal process won't take the interface
down (IIRC the client sends the server a request for a lease when half
of the time allocated for the current lease expires, and retries
several times prior to the lease expiring).

Dynamically adding nodes to the cluster shouldn't be a problem,
removing them should be done manually (that's how I see it) as you
can't differentiate between a node which has been down for a prolonged
period of time due to maintenance and one which is no longer part of
the cluster and should be removed automatically.

My 2 cents.

> from a CIB (both configuration and status sections) and they do not
> reappear there anymore (I recall related issues in the past when node
> reappears in CIB after membership change). Hopefully that was fixed, but
> I'm not sure. Also, as I do not play with cluster size changes right
> now, I don't know exactly how does pacemaker currently deals with
> dynamic change of number of clone instances.
>
> May be Andrew or David can comment on this?
>
> I must admit that it would be very nice to have dynamic membership
> support polished in 2.0 :)
>
> Vladislav
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Online add a new node to cluster communicating by UDPU

2013-02-12 Thread Dan Frincu
Hi,

On Tue, Feb 12, 2013 at 11:10 AM, Michal Fiala  wrote:
> Hello,
>
> is there a way how to online add a new node to corosync/pacemaker
> cluster, where nodes communicate by unicast UDP?

I don't think this is possible as you need to update corosync.conf on
all nodes with the new node to be added and changes to corosync.conf
are only visible after you've restarted corosync.

>
> Thanks
>
> Michal
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] RES: Reboot of cluster members with heavy load on filesystem.

2013-02-11 Thread Dan Frincu
mcastaddr: 226.94.1.1
>> mcastport: 5406
>> ttl: 1
>> }
>> }
>>
>> Can you kindly point what timer/counter should I play with?
>
> I would start by making these higher, perhaps double them and see what
> effect it has.
>
> token:  5000
> token_retransmits_before_loss_const: 10
>
>> What are the reasonable values for them? I got scared with this warning "It 
>> is not recommended to alter this value without guidance
>> from the corosync community."
>> Is there any benefits of changing the rrp_mode from active to passive?

rrp_mode: passive is better tested than active. That's the only real benefit.

>
> Not something I've played with, sorry.
>
>> Should it be done on both hosts?
>
> It should be the same I would imagine.
>
>>
>>> > 
>>> >
>>> > Feb  6 04:30:32 apolo lrmd: [2855]: info: RA output:
>>> > (httpd:0:monitor:stderr) redirecting to systemctl Feb  6 04:31:32
>>> > apolo lrmd: [2855]: info: RA output: (httpd:0:monitor:stderr) redirecting 
>>> > to systemctl Feb  6
>>> 04:31:41 apolo corosync[2848]:  [TOTEM ] A processor failed, forming new 
>>> configuration.
>>> > Feb  6 04:31:47 apolo corosync[2848]:  [CLM   ] CLM CONFIGURATION CHANGE
>>> > Feb  6 04:31:47 apolo corosync[2848]:  [CLM   ] New Configuration:
>>> > Feb  6 04:31:47 apolo corosync[2848]:  [CLM   ] #011r(0) ip(10.10.1.1) 
>>> > r(1) ip(10.10.10.8)
>>> > Feb  6 04:31:47 apolo corosync[2848]:  [CLM   ] Members Left:
>>> > Feb  6 04:31:47 apolo corosync[2848]:  [CLM   ] #011r(0) ip(10.10.1.2) 
>>> > r(1) ip(10.10.10.9)
>>> > Feb  6 04:31:47 apolo corosync[2848]:  [CLM   ] Members Joined:
>>> > Feb  6 04:31:47 apolo corosync[2848]:  [pcmk  ] notice:
>>> > pcmk_peer_update: Transitional membership event on ring 304: memb=1,
>>> > new=0,
>>> > lost=1
>>
>> [snip]
>>
>>> >
>>> > After lots of log apolo asks diana to reboot and sometime after that it 
>>> > got rebooted too.
>>> > We had an old cluster with heartbeat and DRBD used to cause it on that 
>>> > system but now looks like
>>> Pacemaker is the guilt.
>>> >
>>> > Here is my Pacemaker and DRBD configuration
>>> > http://www2.connection.com.br/cbastos/pacemaker/crm_config
>>> > http://www2.connection.com.br/cbastos/pacemaker/drbd_conf/global_commo
>>> > n.setup
>>> > http://www2.connection.com.br/cbastos/pacemaker/drbd_conf/backup.res
>>> > http://www2.connection.com.br/cbastos/pacemaker/drbd_conf/export.res
>>> >
>>> > And more detailed logs
>>> > http://www2.connection.com.br/cbastos/pacemaker/reboot_apolo
>>> > http://www2.connection.com.br/cbastos/pacemaker/reboot_diana
>>> >
>>
>> Best regards,
>> Carlos.
>>
>>
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Corosync over DHCP IP

2013-02-11 Thread Dan Frincu
cemakerd:   notice: pcmk_shutdown_worker:
> Shutdown complete
> Feb 10 07:56:27 [5242] host1 pacemakerd: info: main:   Exiting
> pacemakerd
>
>
> corosync.conf:
>
> compatibility: whitetank
>
> totem {
> version: 2
> secauth: off
> nodeid: 104
> interface {
> member {
> memberaddr: 172.17.0.104
> }
> member {
> memberaddr: 172.17.0.105
> }
> ringnumber: 0
> bindnetaddr: 172.17.0.0
> mcastport: 5426
> ttl: 1
> }
> transport: udpu
> }
>
> logging {
> fileline: off
> to_logfile: yes
> to_syslog: yes
> debug: on
> logfile: /var/log/cluster/corosync.log
> debug: off
> timestamp: on
> logger_subsys {
> subsys: AMF
> debug: off
> }
> }
> service {
># Load the Pacemaker Cluster Resource Manager
>ver:   1
>name:  pacemaker
> }
>
> aisexec {
>user:   root
>group:  root
> }
>
>
>
> Thank you!
>
> --
> Viacheslav Biriukov
> BR
> http://biriukov.me
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crmsh on fedora 18

2013-02-04 Thread Dan Frincu
Hi,

On Mon, Feb 4, 2013 at 9:38 AM, emmanuel segura  wrote:
> Hello List
>
> Sorry for this stupid question, but i would like to know if i can install
> crmsh on fedora 18, i know fedora 18 use pcs, but i don't like pcs

Maybe this helps.

http://www.gossamer-threads.com/lists/linuxha/pacemaker/83637

>
> Thanks
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] No communication between nodes (setup problem)

2013-01-30 Thread Dan Frincu
 egrep "warning|error"
> Jan 30 10:25:59 [1608] server1   crmd:  warning: do_log:FSA: Input
> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> Jan 30 10:25:59 [1607] server1pengine:  warning: cluster_status:We
> do not have quorum - fencing and resource management disabled
> Jan 30 10:28:25 [1525] server1 corosync debug   [QUORUM] getinfo response
> error: 1
> Jan 30 10:40:59 [1607] server1pengine:  warning: cluster_status:We
> do not have quorum - fencing and resource management disabled
>
>
> root@server2 corosync]# cat /var/log/cluster/corosync.log | egrep
> "warning|error"
> Jan 30 10:27:18 [1458] server2   crmd:  warning: do_log:FSA: Input
> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> Jan 30 10:27:18 [1457] server2pengine:  warning: cluster_status:We
> do not have quorum - fencing and resource management disabled
> Jan 30 10:29:19 [1349] server2 corosync debug   [QUORUM] getinfo response
> error: 1
> Jan 30 10:42:18 [1457] server2pengine:  warning: cluster_status:We
> do not have quorum - fencing and resource management disabled
> Jan 30 10:44:36 [1349] server2 corosync debug   [QUORUM] getinfo response
> error: 1
>
>
>
>
> We have installed the following packages:
>
> corosync-2.2.0-1.fc18.i686
> corosynclib-2.2.0-1.fc18.i686
> drbd-bash-completion-8.3.13-1.fc18.i686
> drbd-pacemaker-8.3.13-1.fc18.i686
> drbd-utils-8.3.13-1.fc18.i686
> pacemaker-1.1.8-3.fc18.i686
> pacemaker-cli-1.1.8-3.fc18.i686
> pacemaker-cluster-libs-1.1.8-3.fc18.i686
> pacemaker-libs-1.1.8-3.fc18.i686
> pcs-0.9.27-3.fc18.i686
>
>
>
> Firewalls are disabled, Pinging and SSH communication is working without any
> problems.
>
> With best regards
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] best/proper way to shut down a node for service

2013-01-24 Thread Dan Frincu
Hi,

On Wed, Jan 23, 2013 at 11:28 PM, Brian J. Murrell
 wrote:
> On 13-01-23 03:32 AM, Dan Frincu wrote:
>> Hi,
>
> Hi,
>
>> I usually put the node in standby, which means it can no longer run
>> any resources on it. Both Pacemaker and Corosync continue to run, node
>> provides quorum.
>
> But a node in standby will still be STONITHed if it goes AWOL.  I put a
> node in standby and then yanked it's power and it's peer started STONITH
> operations on it.  That's the part I want to avoid.

You have to explain what AWOL means in this context, even in a 2-node
cluster, putting one node in standby without changing no-quorum-policy
to ignore or setting stonith-enabled=false will just move off the
resources from the node.

Failure to stop a resource running on a node which is in the shutdown
procedure (which means resources will be stopped - shutting down
Pacemaker or by putting the node in standby would have the same effect
on the resources, telling them to stop) will lead to STONITH.

So just to emphasize this again, if there is a stop failure,
regardless of how you turn off the resource (Pacemaker shutdown,
putting the node in standby, telling the resource to move to another
node, etc.), that will STONITH the node.

Now, going back to no-quorum-policy, default action is stop, so in a
2-node cluster, if you shutdown Pacemaker without setting
no-quorum-policy to ignore, when quorum is lost, resources on the
remaining node stop. By putting the node in standby, quorum is still
met, this does not take place.

Once a node is in standby, if you want to stop pacemaker and corosync,
that won't lead into the "node running AWOL" situation you've
mentioned earlier.

Having more than 2 nodes in a cluster means shutdown of pacemaker and
corosync/putting the node in standby won't affect quorum as the other
nodes still work.

Either way, choose whatever fits your requirement best, I just added
some comments related to how this would work and what would be the
possible problems in a 2-node cluster.

HTH,
Dan

>
> b.
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] best/proper way to shut down a node for service

2013-01-23 Thread Dan Frincu
Hi,

On Wed, Jan 23, 2013 at 5:21 AM, Brian J. Murrell  wrote:
> OK.  So you have a corosync cluster of nodes with pacemaker managing
> resources on them, including (of course) STONITH.
>
> What's the best/proper way to shut down a node, say, for maintenance
> such that pacemaker doesn't go trying to "fix" that situation and
> STONITHing it to try to bring it back up, etc.?
>
> Currently my practice for STONITH is to have it reboot.  Maybe it's a
> better practice to have STONITH configured to just power a node down and
> not try to power it back up for this exact reason?
>
> Any other suggestions welcome.

I usually put the node in standby, which means it can no longer run
any resources on it. Both Pacemaker and Corosync continue to run, node
provides quorum.

For global cluster maintenance, such as when upgrading to a major
software version, maintenance-mode is needed.

HTH,
Dan

>
> Cheers,
> b.
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Linux-HA] Which language Pacemaker is written?

2013-01-11 Thread Dan Frincu
Hi,

>From https://www.ohloh.net/p/pacemaker?ref=sample

C 76%
Python 8%
shell script 6%
Other 10%

HTH,
Dan

On Fri, Jan 11, 2013 at 1:15 PM, Felipe Gutierrez
 wrote:
> Hi everyone,
>
> I am writing a school work about program languages and I want to research
> about Pacemaker and its program language.
>
> Which language Pacemaker is written?
> I search at internet and this person said about Erlang
> http://manavar.blogspot.com.br/2012/02/cluster-software-pacemaker-erlang-and.html
>
> Is that right?
>
>
> Thanks,
> Felipe
>
>
> --
> *--
> -- Felipe Oliveira Gutierrez
> -- felipe.o.gutier...@gmail.com
> -- https://sites.google.com/site/lipe82/Home/diaadia*
> ___
> Linux-HA mailing list
> linux...@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] crm shell binaries

2013-01-09 Thread Dan Frincu
@Dejan, @LMB

Could you guys post binaries of crmsh for RedHat, Debian?

Regards,
Dan

-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] glassfish resource agent

2012-12-10 Thread Dan Frincu
Hi,

On Mon, Dec 10, 2012 at 6:53 AM, Soni Maula Harriz
 wrote:
> dear forum,
> is there any ready-to-use glassfish resorce agent ? because i don't find any
> on google.
Not that I know of.
> or do i have to make it by myself ?
I think you do.
> thanks
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to add primitive resource to an already existing Group using crm

2012-09-12 Thread Dan Frincu
Hi,

On Wed, Sep 12, 2012 at 11:56 AM, Kashif Jawed Siddiqui
 wrote:
> Hi,
>
>
>
> I would like to know is there a way to add new primitive resource to an
> already existing group.
>
>
>
> I know crm configure edit requires manual editing.
>
>
>
> But is there a direct command?
>
>
>
> Like,
>
> crm configure group Grp1 Res1 Res2 Res3  ## This is used to create group
>
>
>
> How to add new resource to existing group using command ?

Assuming the primitive is already added you could create a new file
(say it's called group-update) and put in it the following:

group Grp1 Res1 Res2 Res3 this-is-the-new-res-name

Then you could do:

crm configure load update /path/to/group-update

Do test it before, I have only tried this on a shadow cib.

HTH,
Dan

>
>
>
> Regards,
> Kashif Jawed Siddiqui
>
>
> ***
> This e-mail and attachments contain confidential information from HUAWEI,
> which is intended only for the person or entity whose address is listed
> above. Any use of the information contained herein in any way (including,
> but not limited to, total or partial disclosure, reproduction, or
> dissemination) by persons other than the intended recipient's) is
> prohibited. If you receive this e-mail in error, please notify the sender by
> phone or email immediately and delete it!
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Change Hostname

2012-09-06 Thread Dan Frincu
Hi,

On Thu, Sep 6, 2012 at 1:58 PM, Dan Frincu  wrote:
> Hi,
>
> On Thu, Sep 6, 2012 at 1:35 PM, Thorsten Rehm  wrote:
>> Hi everyone,
>>
>> nobody has an idea?
>> Have I missed something in the documentation?
>
> Put the cluster in maintenance-mode.
> Stop Pacemaker, stop Corosync.
> Change the hostname.
> Check if the change actually worked.
> Start Corosync, start Pacemaker.
> Perform a reprobe and refresh from crm. Remove maintenance-mode.
> Delete old node names from cluster configuration (crm node delete
> $old-hostname).

My bad, you're running on Heartbeat.

>
> Dance.
>
> HTH,
> Dan
>
>>
>> Regards,
>> Thorsten
>>
>>
>> On Tue, Sep 4, 2012 at 10:55 AM, Thorsten Rehm  
>> wrote:
>>> Hi,
>>>
>>> ohh, thanks, but I have heartbeat in use.
>>> "Legacy cluster stack based on heartbeat"
>>> http://www.clusterlabs.org/wiki/File:Stack-lha.png
>>>
>>> So, there is no corosync.conf ;)
>>>
>>> Regards,
>>> Thorsten
>>>
>>> On Tue, Sep 4, 2012 at 10:38 AM, Vit Pelcak  wrote:
>>>> -BEGIN PGP SIGNED MESSAGE-
>>>> Hash: SHA1
>>>>
>>>> Dne 4.9.2012 10:28, Thorsten Rehm napsal(a):
>>>>> Hi everyone,
>>>>>
>>>>> I have a cluster with three nodes (stack: heartbeat) and I need to
>>>>> change the hostname of all systems (only the hostname, not the ip
>>>>> address or other network configuration). I have already made
>>>>> several attempts, but so far I have not managed that resources are
>>>>> available without interruption, after I changed the hostname. Is
>>>>> there a procedure that allows me to change the hostname, without
>>>>> loss of resources? If so, how would this look like? Is there a best
>>>>> case?
>>>>
>>>>
>>>> Hm. What about modifying corosync.conf to reflect hostname change on
>>>> all nodes, restarting corosync on all one after another (so you always
>>>> have at least 2 nodes running corosync and resources) and then
>>>> changing that hostname on desired machine and restarting corosync on it?
>>>>
>>>> In general, do not stop corosync on more than 1 node at the time and
>>>> you should be safe.
>>>>
>>>>> Cheers, Thorsten
>>>>>
>>>>> ___ Pacemaker mailing
>>>>> list: Pacemaker@oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org Getting started:
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
>>>>> http://bugs.clusterlabs.org
>>>>
>>>> -BEGIN PGP SIGNATURE-
>>>> Version: GnuPG v2.0.19 (GNU/Linux)
>>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>>>>
>>>> iQEcBAEBAgAGBQJQRb4AAAoJEG+ytY6bjOob0AUH+gKl8OXHnGUUkXe4rFNc1qqr
>>>> W1hkafkjDOl2k475kiXiJ9CbgvP4mJSZJ+naMvyh53BJDuWiZH4i3kl1KZVSCvQ6
>>>> DNrZhHG90BmTLXiE6tCeVWP6K5tKamvLCRGBehiu83lW2kdH0X3uF9KqZlPnBFhy
>>>> AeEYvCsJKfM+u7WndNDFeQVdV//FQaHAB8JZBkgSyHmlvN+bnjUzRTOE1qLyv3/b
>>>> nPYVBOYCJgBjmENRRMoP1xWZgAAMeRCzRrpXo2ZSJ8945E/pmc1+9fPDJCqBXqvr
>>>> CFzI7iZcyidfpKq6h1S9dlDDMdRidj9P8kfEokThtHXpy45/LhdzYrMg6LmvuIc=
>>>> =tZ+G
>>>> -END PGP SIGNATURE-
>>>>
>>>> ___
>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>> --
>>> Mit freundlichen Gruessen / Kind regards
>>> Thorsten Rehm
>>
>>
>>
>> --
>> Mit freundlichen Gruessen / Kind regards
>> Thorsten Rehm
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
> --
> Dan Frincu
> CCNA, RHCE



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Change Hostname

2012-09-06 Thread Dan Frincu
Hi,

On Thu, Sep 6, 2012 at 1:35 PM, Thorsten Rehm  wrote:
> Hi everyone,
>
> nobody has an idea?
> Have I missed something in the documentation?

Put the cluster in maintenance-mode.
Stop Pacemaker, stop Corosync.
Change the hostname.
Check if the change actually worked.
Start Corosync, start Pacemaker.
Perform a reprobe and refresh from crm. Remove maintenance-mode.
Delete old node names from cluster configuration (crm node delete
$old-hostname).

Dance.

HTH,
Dan

>
> Regards,
> Thorsten
>
>
> On Tue, Sep 4, 2012 at 10:55 AM, Thorsten Rehm  
> wrote:
>> Hi,
>>
>> ohh, thanks, but I have heartbeat in use.
>> "Legacy cluster stack based on heartbeat"
>> http://www.clusterlabs.org/wiki/File:Stack-lha.png
>>
>> So, there is no corosync.conf ;)
>>
>> Regards,
>> Thorsten
>>
>> On Tue, Sep 4, 2012 at 10:38 AM, Vit Pelcak  wrote:
>>> -BEGIN PGP SIGNED MESSAGE-
>>> Hash: SHA1
>>>
>>> Dne 4.9.2012 10:28, Thorsten Rehm napsal(a):
>>>> Hi everyone,
>>>>
>>>> I have a cluster with three nodes (stack: heartbeat) and I need to
>>>> change the hostname of all systems (only the hostname, not the ip
>>>> address or other network configuration). I have already made
>>>> several attempts, but so far I have not managed that resources are
>>>> available without interruption, after I changed the hostname. Is
>>>> there a procedure that allows me to change the hostname, without
>>>> loss of resources? If so, how would this look like? Is there a best
>>>> case?
>>>
>>>
>>> Hm. What about modifying corosync.conf to reflect hostname change on
>>> all nodes, restarting corosync on all one after another (so you always
>>> have at least 2 nodes running corosync and resources) and then
>>> changing that hostname on desired machine and restarting corosync on it?
>>>
>>> In general, do not stop corosync on more than 1 node at the time and
>>> you should be safe.
>>>
>>>> Cheers, Thorsten
>>>>
>>>> ___ Pacemaker mailing
>>>> list: Pacemaker@oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
>>>> http://bugs.clusterlabs.org
>>>
>>> -BEGIN PGP SIGNATURE-
>>> Version: GnuPG v2.0.19 (GNU/Linux)
>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>>>
>>> iQEcBAEBAgAGBQJQRb4AAAoJEG+ytY6bjOob0AUH+gKl8OXHnGUUkXe4rFNc1qqr
>>> W1hkafkjDOl2k475kiXiJ9CbgvP4mJSZJ+naMvyh53BJDuWiZH4i3kl1KZVSCvQ6
>>> DNrZhHG90BmTLXiE6tCeVWP6K5tKamvLCRGBehiu83lW2kdH0X3uF9KqZlPnBFhy
>>> AeEYvCsJKfM+u7WndNDFeQVdV//FQaHAB8JZBkgSyHmlvN+bnjUzRTOE1qLyv3/b
>>> nPYVBOYCJgBjmENRRMoP1xWZgAAMeRCzRrpXo2ZSJ8945E/pmc1+9fPDJCqBXqvr
>>> CFzI7iZcyidfpKq6h1S9dlDDMdRidj9P8kfEokThtHXpy45/LhdzYrMg6LmvuIc=
>>> =tZ+G
>>> -END PGP SIGNATURE-
>>>
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> --
>> Mit freundlichen Gruessen / Kind regards
>> Thorsten Rehm
>
>
>
> --
> Mit freundlichen Gruessen / Kind regards
> Thorsten Rehm
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Tool to query Corosync multicast configuration?

2012-08-13 Thread Dan Frincu
Hi,

On Mon, Aug 13, 2012 at 4:45 PM, Andreas Ntaflos
 wrote:
> Hi,
>
> is it possible to somehow query the multicast address(es) and port(s)
> used by Corosync? I mean other than using grep and awk:
>
> egrep "mcastaddr:" /etc/corosync/corosync.conf| awk '{print $2}'
>
> Is there a commandline tool that displays such information? I have
> looked at corosync-cfgtool, but neither the "-a" or "-s" switches make
> it output any multicast information.

netstat -ng
netstat -tupan | grep corosync

It uses both multicast and unicast.

HTH,
Dan

>
> The reason I am asking is that I want to write a Puppet/Facter fact so
> that we get some overview over our many two-node clusters and their
> multicast configurations.
>
> Thanks,
>
> Andreas
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Centos 6.2 corosync errors after reboot prevent joining

2012-07-03 Thread Dan Frincu
Hi,

On Mon, Jul 2, 2012 at 7:47 PM, Martin de Koning  wrote:
> Hi all,
>
> Reasonably new to pacemaker and having some issues with corosync loading the
> pacemaker plugin after a reboot of the node. It looks like similar issues
> have been posted before but I haven't found a relavent fix.
>
> The Centos 6.2 node was online before the reboot and restarting the corosync
> and pacemaker services caused no issues. Since the reboot and subsequent
> reboots, I am unable to get pacemaker to join the cluster.
>
> After the reboot corosync now reports the following:
> Jul  2 17:56:22 sessredis-03 corosync[1644]:   [pcmk  ] WARN:
> route_ais_message: Sending message to local.cib failed: ipc delivery failed
> (rc=-2)
> Jul  2 17:56:22 sessredis-03 corosync[1644]:   [pcmk  ] WARN:
> route_ais_message: Sending message to local.cib failed: ipc delivery failed
> (rc=-2)
> Jul  2 17:56:22 sessredis-03 corosync[1644]:   [pcmk  ] WARN:
> route_ais_message: Sending message to local.cib failed: ipc delivery failed
> (rc=-2)
> Jul  2 17:56:22 sessredis-03 corosync[1644]:   [pcmk  ] WARN:
> route_ais_message: Sending message to local.cib failed: ipc delivery failed
> (rc=-2)
> Jul  2 17:56:22 sessredis-03 corosync[1644]:   [pcmk  ] WARN:
> route_ais_message: Sending message to local.cib failed: ipc delivery failed
> (rc=-2)
> Jul  2 17:56:22 sessredis-03 corosync[1644]:   [pcmk  ] WARN:
> route_ais_message: Sending message to local.crmd failed: ipc delivery failed
> (rc=-2)
>
> The full syslog is here:
> http://pastebin.com/raw.php?i=f9eBuqUh
>
> corosync-1.4.1-4.el6_2.3.x86_64
> pacemaker-1.1.6-3.el6.x86_64
>
> I have checked the the obvious such as inter-cluster communication and
> firewall rules. It appears to me that there may be an issue with the with
> Pacemaker cluster information base and not corosync. Any ideas? Can I clear
> the CIB manually somehow to resolve this?

What does "corosync-objctl | grep member" return? Can you see the same
multicast groups on all of the nodes when you run "netstat -ng"?

To clear the CIB manually do a "rm -rfi /var/lib/heartbeat/crm/*" on
the faulty node (with corosync and pacemaker stopped), then start
corosync and pacemaker.

HTH,
Dan

>
> Cheers
> Martin
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Different Corosync Rings for Different Nodes in Same Cluster?

2012-06-29 Thread Dan Frincu
Hi,

On Thu, Jun 28, 2012 at 6:13 PM, Andrew Martin  wrote:
> Hi Dan,
>
> Thanks for the help. If I configure the network as I described - ring 0 as
> the network all 3 nodes are on, ring 1 as the network only 2 of the nodes
> are on, and using "passive" - and the ring 0 network goes down, corosync
> will start using ring 1. Does this mean that the quorum node will appear to
> be offline to the cluster? Will the cluster attempt to STONITH it? Once the
> ring 0 network is available again, will corosync transition back to using it
> as the communication ring, or will it continue to use ring 1 until it fails?
>
> The ideal behavior would be when ring 0 fails it then communicates over ring
> 1, but keeps periodically checking to see if ring 0 is working again. Once
> it is, it returns to using ring 0. Is this possible?

Added corosync ML in CC as I think this is better asked here as well.

Regards,
Dan

>
> Thanks,
>
> Andrew
>
> 
> From: "Dan Frincu" 
> To: "The Pacemaker cluster resource manager" 
> Sent: Wednesday, June 27, 2012 3:42:42 AM
> Subject: Re: [Pacemaker] Different Corosync Rings for Different Nodes
> inSame Cluster?
>
>
> Hi,
>
> On Tue, Jun 26, 2012 at 9:53 PM, Andrew Martin  wrote:
>> Hello,
>>
>> I am setting up a 3 node cluster with Corosync + Pacemaker on Ubuntu 12.04
>> server. Two of the nodes are "real" nodes, while the 3rd is in standby
>> mode
>> as a quorum node. The two "real" nodes each have two NICs, one that is
>> connected to a shared LAN and the other that is directly connected between
>> the two nodes (for DRBD replication). The quorum node is only connected to
>> the shared LAN. I would like to have multiple Corosync rings for
>> redundancy,
>> however I do not know if this would cause problems for the quorum node. Is
>> it possible for me to configure the shared LAN as ring 0 (which all 3
>> nodes
>> are connected to) and set the rrp_mode to passive so that it will use ring
>> 0
>> unless there is a failure, but to also configure the direct link between
>> the
>> two "real" nodes as ring 1?
>
> Short answer, yes.
>
> Longer answer. I have a setup with two nodes with two interfaces, one
> is connected via a switch to the other node and one is a back-to-back
> link for DRBD replication. In Corosync I have two rings, one that goes
> via the switch and one via the back-to-back link (rrp_mode: active).
> With rrp_mode: passive it should work the way you mentioned.
>
> HTH,
> Dan
>
>>
>> Thanks,
>>
>> Andrew
>>
>> ___________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> --
> Dan Frincu
> CCNA, RHCE
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Different Corosync Rings for Different Nodes in Same Cluster?

2012-06-27 Thread Dan Frincu
Hi,

On Tue, Jun 26, 2012 at 9:53 PM, Andrew Martin  wrote:
> Hello,
>
> I am setting up a 3 node cluster with Corosync + Pacemaker on Ubuntu 12.04
> server. Two of the nodes are "real" nodes, while the 3rd is in standby mode
> as a quorum node. The two "real" nodes each have two NICs, one that is
> connected to a shared LAN and the other that is directly connected between
> the two nodes (for DRBD replication). The quorum node is only connected to
> the shared LAN. I would like to have multiple Corosync rings for redundancy,
> however I do not know if this would cause problems for the quorum node. Is
> it possible for me to configure the shared LAN as ring 0 (which all 3 nodes
> are connected to) and set the rrp_mode to passive so that it will use ring 0
> unless there is a failure, but to also configure the direct link between the
> two "real" nodes as ring 1?

Short answer, yes.

Longer answer. I have a setup with two nodes with two interfaces, one
is connected via a switch to the other node and one is a back-to-back
link for DRBD replication. In Corosync I have two rings, one that goes
via the switch and one via the back-to-back link (rrp_mode: active).
With rrp_mode: passive it should work the way you mentioned.

HTH,
Dan

>
> Thanks,
>
> Andrew
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD < LVM < EXT4 < NFS performance

2012-05-24 Thread Dan Frincu
Hi,

On Mon, May 21, 2012 at 4:24 PM, Christoph Bartoschek  wrote:
> Florian Haas wrote:
>
>>> Thus I would expect to have a write performance of about 100 MByte/s. But
>>> dd gives me only 20 MByte/s.
>>>
>>> dd if=/dev/zero of=bigfile.10G bs=8192  count=1310720
>>> 1310720+0 records in
>>> 1310720+0 records out
>>> 10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s
>>
>> If you used that same dd invocation for your local test that allegedly
>> produced 450 MB/s, you've probably been testing only your page cache.
>> Add oflag=dsync or oflag=direct (the latter will only work locally, as
>> NFS doesn't support O_DIRECT).
>>
>> If your RAID is one of reasonably contemporary SAS or SATA drives,
>> then a sustained to-disk throughput of 450 MB/s would require about
>> 7-9 stripes in a RAID-0 or RAID-10 configuration. Is that what you've
>> got? Or are you writing to SSDs?
>
> I used the same invocation with different filenames each time. To which page
> cache to you refer? To the one on the client or on the server side?
>
> We are using RAID-1 with 6 x 2 disks. I have repeated the local test 10
> times with different files in a row:
>
> for i in `seq 10`; do time dd if=/dev/zero of=bigfile.10G.$i bs=8192
> count=1310720; done
>
> The resulting values on a system that is also used by other programs as
> reported by dd are:
>
> 515 MB/s, 480 MB/s, 340 MB/s, 338 MB/s, 360 MB/s, 284 MB/s, 311 MB/s, 320
> MB/s, 242 MB/s,  289 MB/s
>
> So I think that the system is capable of more than 200 MB/s which is way
> more what can arrive over the network.

A bit off-topic maybe.

Whenever you do these kinds of tests regarding performance on disk
(locally) to test actual speed and not some caching, as Florian said,
you should use oflag=direct option to dd and also echo 3 >
/proc/sys/vm/drop_caches and sync.

I usually use echo 3 > /proc/sys/vm/drop_caches && sync && date &&
time dd if=/dev/zero of=whatever bs=1G count=x oflag=direct && sync &&
date

You can assess if there is data being flushed if the results given by
dd differ from those obtained by calculating the amount of data
written between the two date calls. It also helps to push more data
than the controller can store.

Regards,
Dan

>
> I've done the measurements on the filesystem that sits on top of LVM and
> DRBD. Thus I think that DRBD is not a problem.
>
> However the strange thing is that I get 108 MB/s on the clients as soon as I
> disable the secondary node for DRBD. Maybe there is strange interaction
> between DRBD and NFS.
>
> After reenabling the secondary node the DRBD synchronization is quite slow.
>
>
>>>
>>> Has anyone an idea what could cause such problems? I have no idea for
>>> further analysis.
>>
>> As a knee-jerk response, that might be the classic issue of NFS
>> filling up the page cache until it hits the vm.dirty_ratio and then
>> having a ton of stuff to write to disk, which the local I/O subsystem
>> can't cope with.
>
> Sounds reasonable but shouldn't the I/O subsystem be capable to write
> anything away that arrives?
>
> Christoph
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Corosync / Pacemaker Cluster crashing

2012-04-20 Thread Dan Frincu
be_complete=true: cib not
> connected
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> do_lrm_rsc_op: Performing key=9:5:0:e6a3b9c7-c24d-497a-9c07-d6082ee231a9
> op=lcdcv01_stop_0 )
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]: info:
> rsc:lcdcv01:4: stop
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> attrd_trigger_update: Sending flush op to all hosts for: probe_complete
> (true)
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> attrd_perform_update: Delaying operation probe_complete=true: cib not
> connected
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group lrmd: [22447]: info:
> RA output: (lcdcv01:stop:stderr) logd is not running
> Apr 20 10:54:38 lxdcv01nd01.bauer-uk.bauermedia.group crmd: [22450]: info:
> process_lrm_event: LRM operation lcdcv01_stop_0 (call=4, rc=0, cib-update=9,
> confirmed=true) ok
> Apr 20 10:54:38 corosync [TOTEM ] ring 1 active with no faults
> Apr 20 10:54:41 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> cib_connect: Connected to the CIB after 1 signon attempts
> Apr 20 10:54:41 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> cib_connect: Sending full refresh
> Apr 20 10:54:41 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> attrd_trigger_update: Sending flush op to all hosts for: probe_complete
> (true)
> Apr 20 10:54:41 lxdcv01nd01.bauer-uk.bauermedia.group attrd: [22448]: info:
> attrd_perform_update: Sent update 4: probe_complete=true
>
>
> Bauer Corporate Services UK LP (BCS) is a division of the Bauer Media Group
> the
> largest consumer publisher in the UK, and second largest commercial radio
> broadcaster. BCS provides financial services and manages and develops IT
> systems
> on which our UK publishing, broadcast, digital and partner businesses
> depend.
>
> The information in this email is intended only for the addressee(s) named
> above.
> Access to this email by anyone else is unauthorised. If you are not the
> intended
> recipient of this message any disclosure, copying, distribution or any
> action
> taken in reliance on it is prohibited and may be unlawful. Bauer Corporate
> Services do not warrant that any attachments are free from viruses or other
> defects and accept no liability for any losses resulting from infected email
> transmissions.
>
> Please note that any views expressed in this email may be those of the
> originator and do not necessarily reflect those of this organisation.
>
> Bauer Corporate Services UK LP is registered in England; Registered address
> is
> 1 Lincoln Court, Lincoln Road, Peterborough, PE1 2RF.
>
> Registration number LP13195
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_mon on Node-2 shows both Node-1 & Node-2 as online but crm_mon on Node-1 shows Node-2 as offline

2012-04-20 Thread Dan Frincu
On Fri, Apr 20, 2012 at 3:09 AM, Andrew Beekhof  wrote:
> On Thu, Apr 19, 2012 at 11:51 PM, Dan Frincu  wrote:
>> Hi,
>>
>> On Thu, Apr 19, 2012 at 3:56 PM, Parshvi  wrote:
>>> 1) What is the use of ssh without pass key between cluster nodes in 
>>> pacemaker ?
>>>  a. Use case:
>>>    i. Two nodes in a cluster (Call them Node-1 and Node-2)
>>>    ii. One interface configured in corosync.conf for its heartbeat or
>>> messaging. Eg. Bind net addr : 192.168.10.0
>>>    iii. Another interface configured in /etc/hosts for hostname resolution.
>>>    Eg. IP: 192.168.129.10 Hostname: Node-1
>>>    Eg. IP: 192.168.129.11 Hostname: Node-2
>>>    iv. Hence for all ssh communication between the two nodes, hostname 
>>> resolves
>>> to subnet 129 address.
>>>    v. 12 services configured in active/passive mode
>>>    vi. 1 service configured in master/slave mode
>>>    vii. 8 services are non-sticky (they failback) in active/passive
>>>    viii. 4 services are sticky (do not failback) in active/passive
>>>    ix. Distribution: Node-1 is primary for 8 services (of which 4 are non-
>>> sticky), Node-2 is preferred for 4 services of a total 12 (non-sticky)
>>>
>>>  b. Observations:
>>>    i. On Node-2, the interface was down over which IP: 192.168.129.11 
>>> Hostname:
>>> Node-2 was configured.
>>>    ii. On Node-1 all interfaces were up.
>>>    iii. Interface used by corosync for hearbeat/messaging was up at all 
>>> times
>>> (Bind net addr : 192.168.10.0)
>>>    iv. In crm_mon: Node-1 sees Node-2 as offline
>>>        cibadmin --query fails to work (remote node did not respond)
>>>    v. In crm_mon: Node-2 sees Node-1 as online
>>>    vi. All the services were seen active on Node-1 (including those that 
>>> were
>>> preferred for Node-2). Observed in crm_mon output.
>>>    vii. 4 services for which Node-2 was preferred were seen active Node-2 
>>> also
>>> (hence 4 services active on both the nodes).
>>>    Observed in crm_mon output: Only 4 services were shown active, the 
>>> status of
>>> the rest of the services active on Node-1 did not reflect in crm_mon
>>>    Even though crm_mon on Node-2 sees Node-1 as “online”.
>>>  c. Errors in log file:
>>>    i. On Node-2:
>>>      1. Resource ocf::RscRA:rsc appears to be active on 2 nodes
>>>      2. The above error appears for all the resources configured in 
>>> pacemaker.
>>>
>>>
>>> Query:
>>> 1) For what purpose does Pacemaker require “ssh without a pass key” to be
>>> enabled between the nodes in a cluster ?
>>
>> scp
>
> But pacemaker doesn't use scp... or is this in relation to the
> clusters from scratch document?

It's in relation to the Clusters from Scratch document.

> -ECONFUSED

Sorry about that ;)

>
>>
>>> 2) For what purpose does Pacemaker use Node “hostname” for ? how Node 
>>> “hostname”
>>> come into picture ?
>>
>> When choosing where to allocate resources not explicitly tied to a node. See
>>
>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#node-score-equal
>>
>> and
>>
>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#_background
>>
>>> 3) Let’s say in a two node cluster two communication paths are available 
>>> between
>>> the two nodes.
>>>  a. Eth1 and eth2.
>>>  b. The hostname of the node resolves to IP Address on eth1.
>>>  c. Consider, eth1 (network cable disconnected) goes down.
>>>  d. Eth2 is up, but hostname does not resolve to the IP on eth2 (resolves to
>>> eth1 addr).
>>
>> Inter-node communication is usually specified by IP address, and
>> redundant connections (as in your case) is recommended.
>>
>>>  e. Will this (hostname) have any issue ?
>>>
>>>
>>>
>>>
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>> --
>> Dan Frincu
>> CCNA, RHCE
>>
>> ___
>> Pacemaker mai

Re: [Pacemaker] Corosync service taking 100% cpu and is unable to stop gracefully

2012-04-19 Thread Dan Frincu
On Thu, Apr 19, 2012 at 4:14 PM, Parshvi  wrote:
> Dan Frincu  writes:
>
>>
>> Hi,
>>
>> On Thu, Apr 19, 2012 at 2:11 PM, Parshvi  gmail.com> wrote:
>> > Major issues:
>> > 1) Corosync reaching over 100% cpu usage.
>> > 2) Corosync unable to stop gracefully.
>> > 3) Virtual IP of a resources being assigned as the primary IP on a
> interface,
>> > after a cable disconnect/reconnect on that interface. The static IP on the
>> > interface shown as global secondary IP.
>> >
>> > Use case:
>> > 1) Two nodes in a cluster.
>> > 2) Two communication paths exists between the two nodes, with “rrp_mode” 
>> > set
> to
>> > active in corosync.conf
>>
>> Are both links of the same speed?
> yes. speed of each: 1000Mb/s
>>
>> >  a. One path is a back-to-back connection between the nodes.
>> >  b. Second is  via the LAN network  switch.
>> > 3) The network cable was unplugged on one of the nodes for a while (on both
> the
>> > interfaces). It was reconnected after a short while.
>> >
>> > Observations:
>> > 1) Corosync service was taking 100% cpu on the node whose link was down:
>>
>> What version of Corosync? What OS?
> Corosync Cluster Engine, version '1.2.7' SVN revision '3008'
> OEL (Oracle Enterprise Linux release 5.6)

You need a newer version of Corosync. For redundant rings to work,
1.3.x or higher, for self healing redundant rings, 1.4.x.

>>
>
>> Can you pastebin.com your crm configure show?
> would do that in a followup mail.
>
> Thanks for a quick response Dan.
>
> Here is a snapshot of top:
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4726 root      RT   0  201m 5576 2004 R 100.4  0.1  36:35.31 corosync
>
> Logs and core file have been saved and can be posted if required.
> My response inline.
>
>
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_mon on Node-2 shows both Node-1 & Node-2 as online but crm_mon on Node-1 shows Node-2 as offline

2012-04-19 Thread Dan Frincu
Hi,

On Thu, Apr 19, 2012 at 3:56 PM, Parshvi  wrote:
> 1) What is the use of ssh without pass key between cluster nodes in pacemaker 
> ?
>  a. Use case:
>    i. Two nodes in a cluster (Call them Node-1 and Node-2)
>    ii. One interface configured in corosync.conf for its heartbeat or
> messaging. Eg. Bind net addr : 192.168.10.0
>    iii. Another interface configured in /etc/hosts for hostname resolution.
>    Eg. IP: 192.168.129.10 Hostname: Node-1
>    Eg. IP: 192.168.129.11 Hostname: Node-2
>    iv. Hence for all ssh communication between the two nodes, hostname 
> resolves
> to subnet 129 address.
>    v. 12 services configured in active/passive mode
>    vi. 1 service configured in master/slave mode
>    vii. 8 services are non-sticky (they failback) in active/passive
>    viii. 4 services are sticky (do not failback) in active/passive
>    ix. Distribution: Node-1 is primary for 8 services (of which 4 are non-
> sticky), Node-2 is preferred for 4 services of a total 12 (non-sticky)
>
>  b. Observations:
>    i. On Node-2, the interface was down over which IP: 192.168.129.11 
> Hostname:
> Node-2 was configured.
>    ii. On Node-1 all interfaces were up.
>    iii. Interface used by corosync for hearbeat/messaging was up at all times
> (Bind net addr : 192.168.10.0)
>    iv. In crm_mon: Node-1 sees Node-2 as offline
>        cibadmin --query fails to work (remote node did not respond)
>    v. In crm_mon: Node-2 sees Node-1 as online
>    vi. All the services were seen active on Node-1 (including those that were
> preferred for Node-2). Observed in crm_mon output.
>    vii. 4 services for which Node-2 was preferred were seen active Node-2 also
> (hence 4 services active on both the nodes).
>    Observed in crm_mon output: Only 4 services were shown active, the status 
> of
> the rest of the services active on Node-1 did not reflect in crm_mon
>    Even though crm_mon on Node-2 sees Node-1 as “online”.
>  c. Errors in log file:
>    i. On Node-2:
>      1. Resource ocf::RscRA:rsc appears to be active on 2 nodes
>      2. The above error appears for all the resources configured in pacemaker.
>
>
> Query:
> 1) For what purpose does Pacemaker require “ssh without a pass key” to be
> enabled between the nodes in a cluster ?

scp

> 2) For what purpose does Pacemaker use Node “hostname” for ? how Node 
> “hostname”
> come into picture ?

When choosing where to allocate resources not explicitly tied to a node. See

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#node-score-equal

and

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#_background

> 3) Let’s say in a two node cluster two communication paths are available 
> between
> the two nodes.
>  a. Eth1 and eth2.
>  b. The hostname of the node resolves to IP Address on eth1.
>  c. Consider, eth1 (network cable disconnected) goes down.
>  d. Eth2 is up, but hostname does not resolve to the IP on eth2 (resolves to
> eth1 addr).

Inter-node communication is usually specified by IP address, and
redundant connections (as in your case) is recommended.

>  e. Will this (hostname) have any issue ?
>
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Corosync service taking 100% cpu and is unable to stop gracefully

2012-04-19 Thread Dan Frincu
Hi,

On Thu, Apr 19, 2012 at 2:11 PM, Parshvi  wrote:
> Major issues:
> 1) Corosync reaching over 100% cpu usage.
> 2) Corosync unable to stop gracefully.
> 3) Virtual IP of a resources being assigned as the primary IP on a interface,
> after a cable disconnect/reconnect on that interface. The static IP on the
> interface shown as global secondary IP.
>
> Use case:
> 1) Two nodes in a cluster.
> 2) Two communication paths exists between the two nodes, with “rrp_mode” set 
> to
> active in corosync.conf

Are both links of the same speed?

>  a. One path is a back-to-back connection between the nodes.
>  b. Second is  via the LAN network  switch.
> 3) The network cable was unplugged on one of the nodes for a while (on both 
> the
> interfaces). It was reconnected after a short while.
>
> Observations:
> 1) Corosync service was taking 100% cpu on the node whose link was down:

What version of Corosync? What OS?

>  a. In the above scenario Corosync service could not be stopped gracefully. A
> SIGKILL had to be issued to stop the service.
>  b. On this node, of the two interfaces configured in corosync.conf, one was
> being used for the Virtual IP’s preferred eth.
>    i. It was observed that when the link was up after a disconnection, the
> primary global IP on that interface was the Virtual IP configured for a
> resource.
>    ii. The static IP assigned to the interface was listed as “scope global
> secondary” in the output of `ip addr show`.
>    iii. Also the Virtual IP of the resources configured in pacemaker were
> active on both the nodes.

Can you pastebin.com your crm configure show?

>    iv. `service network restart` also did not work.
>  c. Coroysnc service was stopped (Killed since it could not be stopped), the
> network service was re-started and then corosync was re-started. All good 
> after
> this.
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] start/stop operations fail to happen in parallel on resources

2012-04-19 Thread Dan Frincu
Hi,

On Thu, Apr 19, 2012 at 2:22 PM, Parshvi  wrote:
> Observations:
> max-children=30
> total no. of resources=18
>
> 1) At a default value 4 of max-children, following logs were observed
> that led to monitor op’s timeout for some resources (a total of 18 rscs):
>  a. “max_child_count (4) reached, postponing execution of operation monitor”
>  b. “WARN: perform_ra_op: the operation operation monitor[18] on
> ocf::IPaddr2::ClusterIP for client 3754, stayed in operation list for
> 14100 ms (longer than 1 ms)”
>  c. SOLUTION: the max-children of lrmd was raised to 30.
>  d. ISSUES STILL OBSERVED: while 2-3 resources are stuck in start operation,
> if a rsc is issued an explicit start command `crm resource start rcs1`, then 
> the
> start op on this rsc is delayed until any one of the previous resources exit
> from their start operation.

What version of Pacemaker?

>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problem during Cluster Upgrade

2012-04-12 Thread Dan Frincu
Hi,

On Wed, Apr 11, 2012 at 5:26 PM, Karl Rößmann  wrote:
> Hi all,
>
> I'm upgrading a three node cluster from SLES 11 SP1 to SLES SP2 node by
> node.
> the upgrade includes:
>         corosync-1.3.3-0.3.1     to corosync-1.4.1-0.13.1
>         pacemaker-1.1.5-5.9.11.1 to pacemaker-1.1.6-1.27.26
>         kernel 2.6.32.54-0.3-xen to 3.0.13-0.27-xen
>
> After Upgrading the first node and restarting the cluster I get these
> never ending messages on the DC (which is not the updated node)
>
> Apr 11 14:19:26 orion14 corosync[6865]:   [TOTEM ] Type of received message
> is wrong...  ignoring 6.
> Apr 11 14:19:27 orion14 corosync[6865]:   [TOTEM ] Type of received message
> is wrong...  ignoring 6.
> Apr 11 14:19:28 orion14 corosync[6865]:   [TOTEM ] Type of received message
> is wrong...  ignoring 6.
> Apr 11 14:19:29 orion14 corosync[6865]:   [TOTEM ] Type of received message
> is wrong...  ignoring 6.

I think the question relates more to corosync, added the proper group in CC.

>
> the updated node is still in STANDBY mode.
> Should I ignore the message and put the mode to ONLINE ?
> I don't want the cluster to crash, there are running services
> on the other two nodes.
> So now I stopped the openais on the updated node: no more messages.
> the other two nodes are still up and working.
>
> Any ideas ?

I don't know exactly if a rolling upgrade is possible (I may be wrong
on this one) but putting the cluster in maintenance-mode, upgrading
corosync and pacemaker on all 3 nodes and then re-probing for the
resources is a more common upgrade path. If there are no issues on the
reprobe, then you could take the cluster out of maintenance-mode.

Also, do you have a support contract with Suse? I think their support
can help out more on this.

HTH,
Dan

>
> Karl
>
>
>
> --
> Karl Rößmann                            Tel. +49-711-689-1657
> Max-Planck-Institut FKF                 Fax. +49-711-689-1632
> Postfach 800 665
> 70506 Stuttgart                         email k.roessm...@fkf.mpg.de
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Doc: Utilization and Placement Strategy

2012-02-08 Thread Dan Frincu
Hi,

On Wed, Feb 8, 2012 at 1:11 PM, Gao,Yan  wrote:
> Hi,
>
> The feature "Utilization and Placement Strategy" has been provided for
> quite some time. But it still missing a documentation. (Florian reminded
> us, thanks a lot!).
>
> The attached documentation are based on a blog by Andrew and the
> material from SUSE HAE guide written by Tanja Roth and Thomas Schraitle.
> I added the details about the resource allocation strategy.
>
> One is crm shell syntax version, the other is XML syntax version for
> "Pacemaker_Explained".
>
> If you are interested, please help review it. Any comments or revisions
> are welcome and appreciated!

I've reviewed both files and made some minor additions and fixed a
couple of typos, other than that looks great.

One question though, shouldn't these have been in Docbook format?

Regards,
Dan

>
> Regards,
>  Gao,Yan
> --
> Gao,Yan 
> Software Engineer
> China Server Team, SUSE.
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE


utilization-and-placement-strategy
Description: Binary data


utilization-and-placement-strategy-crm-shell
Description: Binary data
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] MySQL Master-Master replication with Corosync and Pacemaker

2012-01-26 Thread Dan Frincu
Hi,

On Thu, Jan 26, 2012 at 1:43 AM, Peter Scott  wrote:
> Hello.  Our problem is that a Corosync restart on the idle machine in a
> 2-node cluster shutds down the mysqld process there and we need it to stay
> up for replication.  We are very new to Corosync and Pacemaker and have been
> slogging through every tutorial and document we can find.
>
> Here's the detail: We have two MySQL comasters (each is a master and a slave
> of the other).  Traffic needs to arrive at only one machine at a time
> because otherwise conflicting simultaneous updates at each machine would
> cause a problem.  There is a single IP for clients (192.168.185.50, see
> below).
>
> After much sweating, we came up with the configuration below.  It works: if
> we kill the machine that's in use we see it switch to the other one.  MySQL
> connections are seamlessly rerouted.
>
> The problem is this: Say that dev-mysql01 is the active node.  If we restart
> Corosync on dev-mysql02, it stops mysqld there and does not restart it.  We
> can of course restart it manually but we want to understand why this is
> happening because it surprises us and maybe there are other circumstances
> under which it would either stop mysqld or fail to reatart it.

Corosync is the first layer in the cluster stack (membership and
messaging), Pacemaker is the second layer (cluster resource
management), your services are on the third layer.

You take down the bottom layer, that ensures communication, the upper
layers have no way to talk to the rest of the cluster.

Bottom line, when services are controlled by the cluster and through
manual intervention the processes that control them are stopped,
everything under their control stops as well.

If this is intended for administrative purposes, follow Florian's advice.

HTH,
Dan

>
> mysqld has to run on the inactive machine so that the active one can
> replicate all the transactions there, so that if the active one goes down
> the inactive one can come up in the current state.
>
> Why is a Corosync restart stopping mysqld?
>
> Here's our configuration:
>
> node dev-mysql01
> node dev-mysql02
> primitive DBIP ocf:heartbeat:IPaddr2 \
>        params ip="192.168.185.50" cidr_netmask="24" \
>        op monitor interval="30s"
> primitive mysql ocf:heartbeat:mysql \
>        params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf"
> datadir="/var/lib/mysql" user="mysql" pid="/var/run/mysqld/mysqld.pid"
> socket="/var/lib/mysql/mysql.sock" test_passwd="secret"
> test_table="lbcheck.lbcheck" test_user="lbcheck" \
>        op monitor interval="20s" timeout="10s" \
>        meta migration-threshold="10"
> group mysql_group DBIP mysql
> location master-prefer-node1 mysql_group 50: dev-mysql01
> property $id="cib-bootstrap-options" \
>        dc-version="1.1.2-f059ec7ced7a86ff4a0b963bccfe" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="2" \
>        stonith-enabled="false" \
>        no-quorum-policy="ignore"
> rsc_defaults $id="rsc-options" \
>        resource-stickiness="100"
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] corosync vs. pacemaker 1.1

2012-01-26 Thread Dan Frincu
Hi,

On Wed, Jan 25, 2012 at 5:08 PM, Kiss Bence  wrote:
> Hi,
>
> I am newbie to the clustering and I am trying to build a two node
> active/passive cluster based upon the documentation:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
> My systems are Fedora 14, uptodate. After forming the cluster as wrote, I
> started to test it. (resources: drbd-> lvm-> fs ->group of services)
> Resources moved around, nodes rebooted and killed (first I tried it in
> virtual environment then also on real machines).
>
> After some events the two nodes ended up in a kind of state of split-brain.
> The crm_mon showed me that the other node is offline at both nodes although
> the drbd subsystem showed everything in sync and working. The network was
> not the issue (ping, tcp and udp communications were fine). Nothing changed
> from the network view.
>
> At first the rejoining took place quiet well, but some more events after it
> took longer and after more event it didn't. The network dump showed me the
> multicast packets still coming and going. At corosync (crm_node -l) the
> other node didn't appeared both on them. After trying configuring the cib
> logs was full of messages like ": not in our membership".
>
> I tried to erase the config (crm configure erase, cibadmin -E -f) but it
> worked only locally. I noticed that the pacemaker process didn't started up
> normally on the node that was booting after the other. I also tried to
> remove files from /var/lib/pengine/ and /var/lib/hearbeat/crm/ but only the
> resources are gone. It didn't help on forming a cluster without resources.
> The pacemaker process exited some 20 minutes after it started. Manual
> starting was the same.
>
> After digging into google for answers I found nothing helpful. From running
> tips I changed in the /etc/corosync/service.d/pcmk file the version to 1.1
> (this is the version of the pacemaker in this distro). I realized that the
> cluster processes were startup from corosync itself not by pacemaker. Which
> could be omitted. The cluster forming is stable after this change even after
> many many events.
>
> Now I reread the document mentioned above, and I wonder why it wrote the
> "Important notice" on page 37. What is wrong theoretically with my scenario?
> Why does it working? Why didn't work the config suggested by the document?
>
> Tests were done firsth on virtual machines of a Fedora 14 (1 CPU core, 512Mb
> ram, 10G disk, 1G drbd on logical volume, physical  volume on drbd forming
> volgroup named cluster.)/node.
>
> Then on real machines. They have more cpu cores (4), more RAM (4G) and more
> disk (mirrored 750G), 180G drbd, and 100M garanteed routed link between the
> nodes 5 hops away.
>
> By the way how should one configure the corosync to work on multicast routed
> network? I had to create an openvpn tap link between the real nodes for
> working. The original config with public IP-s didn't worked. Is corosync
> equipped to cope with the multicast pim messages? Or it was a firewall
> issue.

First question, what versions of software are on each of the nodes?

When using multicast, corosync doesn't care about "routing" the
messages AFAIK, it relies on the network layer to do it's job. Now the
"split-brain" you mention can take place due to network interruption,
or due to missing or untested fencing as well.

Second question, do you have fencing configured?

You've mentioned 2(?) nodes "5 hops away", I'm guessing they're not in
the same datacenter. If so, did you also test the latency on the
network between endpoints? Also can you make sure PIM routing is
enabled on all of the "hops" along the way?

Your scenario seems to be a split-site, so you may be interested in
https://github.com/jjzhang/booth as well.

Regards,
Dan

>
> Thanks in advance,
> Bence
>
> ___________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Best setup for lots and lots of IPs

2012-01-20 Thread Dan Frincu
Hi,

On Thu, Jan 19, 2012 at 9:49 PM, Anton Melser  wrote:
> Hi,
> I want to set up a very simple NAT device for natting around 2000
> internal /24 networks to around 2000 external IPs (1 /24 = 1 public
> IP). That part works fine (and is *extremely* efficient, I have it on
> a pretty powerful machine but cpu is 0% with 2gbps going through!)
> with iproute2 and iptables. I want it to have some failover though...
> I am discovering everything here (including iproute2 and iptables),
> and someone suggested I look at corosync + pacemaker. I did the
> tutorial (btw if I end up using this I'll translate it into French if
> you would like) and things seemed to work fine for a few IPs...
> However, my
>
> crm configure primitive ClusterIP.ABC ocf:heartbeat:IPaddr2 params
> ip=10.A.B.C cidr_netmask=32 op monitor interval=120s
>
> commands started to slow down around 200 IPs and then to a crawl at
> 500-600 or so. It got to around 1000 before I stopped the VMs I was
> testing on to move them onto a much more powerful VM host. It is
> taking an absolute age to get back up again. This may be normal, and
> there may be no way around it with any decent solution - I simply have
> no idea.
> Am I trying to achieve something with the wrong tools here? I don't
> need any sort of connection tracking or anything - we can handle up to
> even maybe 5 minutes of downtime (as long as it's not regularly
> happening). The need is relatively simple but the numbers of
> networks/IPs may make this unwieldy using these tools.
> Any pointers?

There are a couple of performance related topics that you can look at
for further reference.

http://www.gossamer-threads.com/lists/linuxha/pacemaker/77382?do=post_view_threaded
http://www.gossamer-threads.com/lists/linuxha/pacemaker/77384?do=post_view_threaded

However the way I see it in your scenario I would take another
approach. Mind you this is just an opinion on the matter, nothing
else, but I would either update the IPaddr2 script or create a new one
based on it that would either:

a) take 1000 parameters (and internally do a for loop, because I'd
rather have 1 script with 1000 parameters than 1000 scripts with 1
parameter)

b) (based on the use case of 2000 IP's I'd guess you have at least a
/21 public subnet available - or even larger - and based on good
practice I'd also guess these IP's are given from a continuous range,
in which case the script would) take a start IP and end IP as
parameters, and perform a for loop for the resulting range (thus using
only 2 parameters for the IP definition, and the other parameters I've
seen in the example were netmask and monitoring interval, a grand
total of 4).

>From my point of view, such a high number of resources in a Pacemaker
cluster for the sole purpose of adding/removing IP addresses is an
overkill, and another solution, such as the one I suggested makes more
sense. Of course, I went on the assumption that all of these IP's are
either needed all together or not at all, but even if this is not the
case, I doubt you need individual rules per IP, more along the line of
needing to control a large range + some corner cases with individual
assignments, the latter being possible with IPaddr2 just as usual
whilst keeping the total number of resources significantly lower.

The problem with 1000 resources is that when going into the monitoring
part, you can only monitor $LRMD_MAX_CHILDREN resources at a time
(which by default is 4), so you can increase this number and have n
monitor operations run in parallel. You'll have to see how the
timeouts fit in with the increased monitor operations and if there is
a negative effect on performance due to the increased number of
monitor operations.

HTH,
Dan

> Thanks heaps,
> Anton
>
> --
> echo '16i[q]sa[ln0=aln100%Pln100/snlbx]sbA0D4D465452snlbxq' | dc
> This will help you for 99.9% of your problems ...
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] rsc_ticket and 1.2 rng

2012-01-16 Thread Dan Frincu
Hi,

On Mon, Jan 16, 2012 at 5:58 PM, Vladislav Bogdanov
 wrote:
> Hi Andrew,
>
> is it intentional that 1.2 schema which is now default misses rsc_ticket
> which is now not only works but even well documented by suse?

Sorry to barge in, but there is a pull request related to this issue.

https://github.com/ClusterLabs/pacemaker/pull/6

HTH,
Dan

>
> Vladislav
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Cannot Create Primitive in CRM Shell

2012-01-09 Thread Dan Frincu
Hi,

On Mon, Jan 9, 2012 at 1:44 PM, Florian Haas  wrote:
> On Mon, Jan 9, 2012 at 11:42 AM, Dan Frincu  wrote:
>> Hi,
>>
>> On Fri, Jan 6, 2012 at 11:24 PM, Andrew Martin  wrote:
>>> Hello,
>>>
>>> I am working with DRBD + Heartbeat + Pacemaker to create a 2-node
>>> highly-available cluster. I have been following this official guide on
>>> DRBD's website for configuring all of the components:
>>> http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf
>>>
>>> However, once I go to configure the primitives in pacemaker's CRM shell
>>> (section 4.1 in the PDF above) I am unable to create the primitive. For
>>> example, I enter the following configuration for a DRBD device called
>>> "drive":
>>> primitive p_drbd_drive \
>>>
>>>   ocf:linbit:drbd \
>>>
>>>   params drbd_resource="drive" \
>>>
>>>   op monitor interval="15" role="Master" \
>>>
>>>   op monitor interval="30" role="Slave"
>>>
>>> After entering all of these lines I hit enter and nothing is returned - it
>>> appears frozen and I am never returned to the "crm(live)configure# " shell.
>>> An strace of the process does not reveal any obvious blocks. I have also
>>> tried entering the entire configuration on a single line with the same
>>> result.
>>
>> I would recommend going through this guide first
>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/
>
> That's a bit of a knee-jerk response if I may say so, and when I wrote
> those guides[1] the intention was specifically that people could
> peruse them _without_ first having to check the documentation that
> covers the configuration internals.

I apologize if it came through as a "knee-jerk response" on my behalf,
if I don't understanding the technology I work with, I look at the
docs, that's why I always point others to the documentation as well.

I have followed the tech guides in reference many times and I'm not in
any way implying that they shouldn't be followed ad-literam, I've
explained in my previous statement why I recommend the docs.

Sorry for the noise.

>
> At any rate, Andrew, if your crm shell is freezing up when you're
> simply trying to add a primitive, something must be seriously awry in
> your setup -- it's something that I've not run into personally, unless
> the cluster was already responding to an error state on one of the
> nodes. Are you sure your cluster is behaving OK otherwise? Are you
> getting meaningful output from "crm_mon -1"? Does your cluster report
> it has successfully elected a DC?
>
> Cheers,
> Florian
>
> [1] Which I did while employed by Linbit, which is no longer the case,
> as they have asked I point out. http://wp.me/p4XzQ-bN
>
> --
> Need help with High Availability?
> http://www.hastexo.com/now
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Cannot Create Primitive in CRM Shell

2012-01-09 Thread Dan Frincu
Hi,

On Fri, Jan 6, 2012 at 11:24 PM, Andrew Martin  wrote:
> Hello,
>
> I am working with DRBD + Heartbeat + Pacemaker to create a 2-node
> highly-available cluster. I have been following this official guide on
> DRBD's website for configuring all of the components:
> http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf
>
> However, once I go to configure the primitives in pacemaker's CRM shell
> (section 4.1 in the PDF above) I am unable to create the primitive. For
> example, I enter the following configuration for a DRBD device called
> "drive":
> primitive p_drbd_drive \
>
>   ocf:linbit:drbd \
>
>   params drbd_resource="drive" \
>
>   op monitor interval="15" role="Master" \
>
>   op monitor interval="30" role="Slave"
>
> After entering all of these lines I hit enter and nothing is returned - it
> appears frozen and I am never returned to the "crm(live)configure# " shell.
> An strace of the process does not reveal any obvious blocks. I have also
> tried entering the entire configuration on a single line with the same
> result.

I would recommend going through this guide first
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/

>
> What can I try to debug this and move forward with configuring pacemaker? Is
> there a command I can use to completely clear out pacemaker to perhaps start
> fresh?

crm configure erase

It will however do what it says, so use it with caution, you have been warned.

>
> Thanks,
>
> Andrew
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] syslog full of redundand link messages

2012-01-09 Thread Dan Frincu
Hi,

On Sun, Jan 8, 2012 at 1:59 AM, Attila Megyeri
 wrote:
> Hi All,
>
>
>
> My syslogs are full of messages like this:
>
>
>
> Jan  7 23:55:47 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:48 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:48 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:48 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:49 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:49 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:49 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:50 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:50 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:50 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:51 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:51 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:51 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:52 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:52 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:52 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
>
>
>
>
> What could be the reason for this?
>
>
>
>
>
> Pacemaker 1.1.6, Corosync 1.4.2
>
>
>
>
>
> The relevant part of the config:
>
>
>
> Eth0 is ont he 10.100.1.X subnet, eth1 is 192.168.100.X
>
>
>
>
>
>
>
>
>
> totem {
>
>     version: 2
>
>     secauth: off
>
>     threads: 0
>
>     rrp_mode: passive
>
>     interface {
>
>     ringnumber: 0
>
>     bindnetaddr: 10.100.1.255
>
>     mcastaddr: 226.100.40.1
>
>     mcastport: 4000
>
>     }
>
>     interface {
>
>     ringnumber: 1
>
>     bindnetaddr: 192.168.100.255
>
>     mcastaddr: 226.101.40.1
>
>     mcastport: 4000
>
>     }
>

Are the subnets /24 or higher (/23, /22, etc.)? Because as I see
you're using what would be the broadcast address on a /24 subnet and
may cause issues.

>
>
>
>
> }
>
>
>
>
>
> Thanks,
>
>
>
> Attila
>
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] large cluster design questions

2012-01-06 Thread Dan Frincu
 I suggested splitting the cluster by purpose, this way for
MySQL nodes you install and configure as necessary, but don't do the
same on the rest of the nodes.

One other thing, as I see it, you want an N-to-N cluster, with any one
service being able to run on any node and to failover to any node.
Consider all of the services that need coordinated access to data, now
consider any node in the cluster can possibly run that service, which
further along means that you need all the nodes to have access to the
same shared data, so you're talking about a GFS2/OCFS2 cluster
spanning 45 nodes. I know I have an knack on stating the obvious, but
people most of the time say one thing and think another, so when you
reply with what they say, then all of a sudden when someone else other
than you says it, it sheds a different light on the matter.

Bottom line, split the nodes into clusters that match a common purpose.

There's bound to be more input on this on the matter, this is just my opinion.

HTH,
Dan

[1] http://oss.clusterlabs.org/pipermail/pacemaker/2012-January/012639.html

>
> Many thanks for your thoughts on this,
> Christian.
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Large cluster

2012-01-06 Thread Dan Frincu
Hi,

On Thu, Jan 5, 2012 at 6:43 PM, Graantik  wrote:
> Hi all,
>
> I have a task that I think can logically be implemented using a
> pacemaker/corosync cluster with many nodes (e.g. 15) and maybe thousand or
> more resources. Most of the resources are parametrized processes controlled
> by a custom resource agent. The resources are added and removed dynamically,
> typically many (e.g. 100) at one time.
>
> My first tests in a VM environment show that - even after some tuning of
> lrmd max-children and custom-batch-limit, optimizing the RA and having the
> processes idle - adding so many resources in one step (xml based) appears to
> bring the cluster to its knees, i.e. nodes become unresponsive, DC and other
> nodes have very high load, and the operation takes an hour or longer.
>
> Does this mean that the design limit of this software/hardware is reached or
> are there ways like tuning or best practices to make such a scenario work?

In terms of performance testing on large clusters there is an article
that may be interesting to read
http://theclusterguy.clusterlabs.org/post/1241986422/large-cluster-performance

In the article it talks about using 1 resources, so it's higher
than your use case, you can take away from it the timings that you
have had and the ones presented there and go from there.

Bare in mind that when dealing with so many resources and nodes it
might help to tweak certain things, such as the maximum message size
for corosync (the article mentions using 256k), timeouts in corosync
token might have to be increased, as high load on the systems may
delay replies in network traffic, and also having to sync the CIB onto
~15 nodes as you mentioned means that you _should_ use multicast,
switches must support igmp snooping and have it enabled and properly
configured, the entire cluster should be in a separate vlan, or have
some form of dedicated network, to ensure not only throughput but also
latency and to prevent interference of other network traffic, etc.

>
> Are there known implementations of comparable size?

In terms of nodes, most I know of are clusters of ~10-12 nodes, in
terms of resources, not that I know of.

HTH,
Dan

>
> Thanks
> Gerhard
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to live migrate the kvm vm

2011-12-13 Thread Dan Frincu
Hi,

On Tue, Dec 13, 2011 at 11:13 AM, Qiu Zhigang  wrote:
> Hi,
>
>> -Original Message-----
>> From: Dan Frincu [mailto:df.clus...@gmail.com]
>> Sent: Tuesday, December 13, 2011 4:43 PM
>> To: The Pacemaker cluster resource manager
>> Subject: Re: [Pacemaker] How to live migrate the kvm vm
>>
>> Hi,
>>
>> On Tue, Dec 13, 2011 at 6:11 AM, Qiu Zhigang 
>> wrote:
>> > Hi,
>> >
>> > Thank you, you are right, I correct the 'allow-migrate="true"', but now I 
>> > found
>> another problem when migrate, migrate failed.
>> > The following is the log.
>> >
>> > Dec 13 12:10:03 h10_151 kernel: type=1400 audit(1323749403.251:623):
>> > avc:  denied  { search } for  pid=27201 comm="virsh" name="libvirt"
>> > dev=dm-0 ino=2098071 scontext=unconfined_u:system_r:corosync_t:s0
>> > tcontext=system_u:object_r:virt_var_run_t:s0 tclass=dir Dec 13
>> > 12:10:04 h10_151 kernel: type=1400 audit(1323749404.067:624): avc:
>> > denied  { search } for  pid=27218 comm="VirtualDomain" name=""
>> > dev=0:1c ino=13825028 scontext=unconfined_u:system_r:corosync_t:s0
>> > tcontext=system_u:object_r:nfs_t:s0 tclass=dir Dec 13 12:10:04 h10_151
>> > kernel: type=1400 audit(1323749404.252:625): avc:  denied  { read }
>> > for  pid=27242 comm="virsh" name="random" dev=devtmpfs ino=3585
>> > scontext=unconfined_u:system_r:corosync_t:s0
>> > tcontext=system_u:object_r:random_device_t:s0 tclass=chr_file
>>
>> You need to take a look at the SELinux context.
>>
>> Regards,
>> Dan
>>
>
> I'm not familiar with SElinux context, but I have disabled selinux .
>
> [root@h10_151 ~]# cat /etc/sysconfig/selinux
>
> # This file controls the state of SELinux on the system.
> # SELINUX= can take one of these three values:
> #     enforcing - SELinux security policy is enforced.
> #     permissive - SELinux prints warnings instead of enforcing.
> #     disabled - No SELinux policy is loaded.
> SELINUX=disable
> # SELINUXTYPE= can take one of these two values:
> #     targeted - Targeted processes are protected,
> #     mls - Multi Level Security protection.
> SELINUXTYPE=targeted
>
> How can I solve this issue, or any other information you need to help me ?

Try getenforce on both nodes, it should return Disabled. If it doesn't
you need to check that SELinux is disabled on both nodes and then
reboot the nodes.

HTH,
Dan

>
>
> Best Regards,
>
>> >
>> > [root@h10_145 ~]# crm
>> > crm(live)# status
>> > 
>> > Last updated: Tue Dec 13 12:09:06 2011
>> > Stack: openais
>> > Current DC: h10_145 - partition with quorum
>> > Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe
>> > 2 Nodes configured, 2 expected votes
>> > 2 Resources configured.
>> > 
>> >
>> > Online: [ h10_151 h10_145 ]
>> >
>> >  test2  (ocf::heartbeat:VirtualDomain): Started h10_151 (unmanaged)
>> > FAILED
>> >  test1  (ocf::heartbeat:VirtualDomain): Started h10_145 (unmanaged)
>> > FAILED
>> >
>> > Failed actions:
>> >    test1_stop_0 (node=h10_145, call=19, rc=1, status=complete):
>> > unknown error
>> >    test2_stop_0 (node=h10_151, call=14, rc=1, status=complete):
>> > unknown error
>> >
>> > Best Regards,
>> >
>> >> -Original Message-
>> >> From: Arnold Krille [mailto:arn...@arnoldarts.de]
>> >> Sent: Monday, December 12, 2011 7:52 PM
>> >> To: The Pacemaker cluster resource manager
>> >> Subject: Re: [Pacemaker] How to live migrate the kvm vm
>> >>
>> >> Hi,
>> >>
>> >> On Monday 12 December 2011 11:22:51 邱志刚 wrote:
>> >> > I have 2-node cluster of pacemaker,I want to migrate the kvm vm
>> >> > with command "migrate", but I found the vm isn't migrated, actually
>> >> > it is shutdown and then start on other node. I checked the log and
>> >> > found the vm is stopped but not migrated.
>> >>
>> >> > How could I live migrate the vm ? The configuration :
>> >> > crm(live)configure# show
>> >> > primitive test1 ocf:heartbeat:VirtualDomain \
>> >> >     params config="/etc/libvirt/qemu/test1.xml"
>> >> > hypervisor="qemu:///system" \
>> >> >     meta allow-migrate="ture" priority="10

Re: [Pacemaker] How to live migrate the kvm vm

2011-12-13 Thread Dan Frincu
Hi,

On Tue, Dec 13, 2011 at 6:11 AM, Qiu Zhigang  wrote:
> Hi,
>
> Thank you, you are right, I correct the 'allow-migrate="true"', but now I 
> found another problem when migrate, migrate failed.
> The following is the log.
>
> Dec 13 12:10:03 h10_151 kernel: type=1400 audit(1323749403.251:623): avc:  
> denied  { search } for  pid=27201 comm="virsh" name="libvirt" dev=dm-0 
> ino=2098071 scontext=unconfined_u:system_r:corosync_t:s0 
> tcontext=system_u:object_r:virt_var_run_t:s0 tclass=dir
> Dec 13 12:10:04 h10_151 kernel: type=1400 audit(1323749404.067:624): avc:  
> denied  { search } for  pid=27218 comm="VirtualDomain" name="" dev=0:1c 
> ino=13825028 scontext=unconfined_u:system_r:corosync_t:s0 
> tcontext=system_u:object_r:nfs_t:s0 tclass=dir
> Dec 13 12:10:04 h10_151 kernel: type=1400 audit(1323749404.252:625): avc:  
> denied  { read } for  pid=27242 comm="virsh" name="random" dev=devtmpfs 
> ino=3585 scontext=unconfined_u:system_r:corosync_t:s0 
> tcontext=system_u:object_r:random_device_t:s0 tclass=chr_file

You need to take a look at the SELinux context.

Regards,
Dan

>
> [root@h10_145 ~]# crm
> crm(live)# status
> 
> Last updated: Tue Dec 13 12:09:06 2011
> Stack: openais
> Current DC: h10_145 - partition with quorum
> Version: 1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> 
>
> Online: [ h10_151 h10_145 ]
>
>  test2  (ocf::heartbeat:VirtualDomain): Started h10_151 (unmanaged) FAILED
>  test1  (ocf::heartbeat:VirtualDomain): Started h10_145 (unmanaged) FAILED
>
> Failed actions:
>    test1_stop_0 (node=h10_145, call=19, rc=1, status=complete): unknown error
>    test2_stop_0 (node=h10_151, call=14, rc=1, status=complete): unknown error
>
> Best Regards,
>
>> -Original Message-
>> From: Arnold Krille [mailto:arn...@arnoldarts.de]
>> Sent: Monday, December 12, 2011 7:52 PM
>> To: The Pacemaker cluster resource manager
>> Subject: Re: [Pacemaker] How to live migrate the kvm vm
>>
>> Hi,
>>
>> On Monday 12 December 2011 11:22:51 邱志刚 wrote:
>> > I have 2-node cluster of pacemaker,I want to migrate the kvm vm with
>> > command "migrate", but I found the vm isn't migrated, actually it is
>> > shutdown and then start on other node. I checked the log and found the
>> > vm is stopped but not migrated.
>>
>> > How could I live migrate the vm ? The configuration :
>> > crm(live)configure# show
>> > primitive test1 ocf:heartbeat:VirtualDomain \
>> >     params config="/etc/libvirt/qemu/test1.xml"
>> > hypervisor="qemu:///system" \
>> >     meta allow-migrate="ture" priority="100" target-role="Started"
>> > is-managed="true" \
>> >     op start interval="0" timeout="120s" \
>> >     op stop interval="0" timeout="120s" \
>> >     op monitor interval="10s" timeout="30s" depth="0" \
>> >     op migrate_from interval="0" timeout="120s" \
>> >     op migrate_to interval="0" timeout="120"
>>
>> I hope that "ture" is only a typo when writing the email. Otherwise its 
>> probably
>> the reason why your machine stop-start instead of a nice migration.
>> Try with 'allow-migrate="true"' and see if that helps.
>>
>> Have fun,
>>
>> Arnold
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Syntax highlighting in vim for crm configure edit

2011-12-09 Thread Dan Frincu
Hi,

On Mon, Dec 5, 2011 at 3:54 PM, Dejan Muhamedagic  wrote:
> Hi,
>
> On Tue, Nov 22, 2011 at 07:14:24PM +0200, Dan Frincu wrote:
>> Hi,
>>
>> On Tue, Nov 15, 2011 at 11:47 AM, Raoul Bhatia [IPAX]  
>> wrote:
>> > hi!
>> >
>> > On 2011-08-19 16:28, Dan Frincu wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Thu, Aug 18, 2011 at 5:53 PM, Digimer  wrote:
>> >>>
>> >>> On 08/18/2011 10:39 AM, Trevor Hemsley wrote:
>> >>>>
>> >>>> Hi all
>> >>>>
>> >>>> I have attached a first stab at a vim syntax highlighting file for 'crm
>> >>>> configure edit'
>> >>>>
>> >>>> To activate this, I have added 'filetype plugin on' to my /root/.vimrc
>> >>>> then created /root/.vim/{ftdetect,ftplugin}/pcmk.vim
>> >>>>
>> >>>> In /root/.vim/ftdetect/pcmk.vim I have the following content
>> >>>>
>> >>>> au BufNewFile,BufRead /tmp/tmp* set filetype=pcmk
>> >>>>
>> >>>> but there may be a better way to make this happen. /root/.vim/pcmk.vim
>> >>>> is the attached file.
>> >>>>
>> >>>> Comments (not too nasty please!) welcome.
>> >>
>> >> I've added a couple of extra keywords to the file, to cover a couple
>> >> more use cases. Other than that, great job.
>> >
>> > will this addition make it into some package(s)?
>> > would it be right to ship this vim syntax file with crm?
>>
>> In the hope it will be a part of crm, I've written a patch for this.
>> Applying the patch over cibconfig.py and utils.py on Pacemaker 1.1.5
>> and adding the pcmk.vim file to the vim syntax folder (for Debian
>> Squeeze it's /usr/share/vim/vim72/syntax) gives access to syntax
>> highlighting in crm configure edit, if using vi/vim as editor.
>>
>> Original work on pcmk.vim by Trevor Hemsley ,
>> a couple of additions by me.
>>
>> Please review it and and add a Signed-Off line if it's ok.
>
> Just tried it out, and when I do :set filetype=pcmk, vim spews at
> me this:
>
> Error detected while processing /usr/share/vim/vim72/syntax/synload.vim:
> line   58:
> E127: Cannot redefine function 3_SynSet: It is in use
> E127: Cannot redefine function 3_SynSet: It is in use
> E127: Cannot redefine function 3_SynSet: It is in use
> E127: Cannot redefine function 3_SynSet: It is in use
> Error detected while processing /usr/share/vim/vim72/syntax/nosyntax.vim:
> line   21:
> E218: autocommand nesting too deep
> Error detected while processing /usr/share/vim/vim72/syntax/synload.vim:
> line   58:
> E127: Cannot redefine function 3_SynSet: It is in use
> Error detected while processing /usr/share/vim/vim72/syntax/syntax.vim:
> line   40:
> E218: autocommand nesting too deep
>
> BTW, I just copied the pcmk.vim file to ~/.vim/syntax.
>

Well, first of all, the patch was meant to be applied to the source, I
did not mention this before. To apply it on the running system just
use the patch from http://pastebin.com/PWpuzQ4m

The patch also assumes the pcmk.vim file is copied to
/usr/share/vim/vim72/syntax/pcmk.vim
If not the path must be adjusted to match the location of pcmk.vim.

Then when opening crm configure edit the syntax highlighting is
applied. You're test and the respective errors come from not applying
the patch.

> Otherwise, the output looks fine. There are a few differences to
> the configure show output:
>
> - quotes are red along with the value
> - ids are green whereas in configure show they are normal
> - id references are light blue and in configure show they are green
> - scores are red and in configure show violet
> - roles/actions in constraints red and in configure show normal
>
> There are probably a few more differences.
>

Indeed, not perfect, however it's better than nothing and could be
improved over time.

Regards,
Dan

> Cheers,
>
> Dejan
>
>
>> Regards,
>> Dan
>>
>> p.s.: many thanks to everyone for the input received on IRC.
>>
>> >
>> > thanks,
>> > raoul
>> > --
>> > 
>> > DI (FH) Raoul Bhatia M.Sc.          email.          r.bha...@ipax.at
>> > Technischer Leiter
>> >
>> > IPAX - Aloy Bhatia Hava OG          web.          http://www.ipax.at
>> > Barawitzkagasse 10/2/2/11           email.            off...@ipax.at
>> > 1190 Wien 

Re: [Pacemaker] Syntax highlighting in vim for crm configure edit

2011-11-22 Thread Dan Frincu
Hi,

On Tue, Nov 15, 2011 at 11:47 AM, Raoul Bhatia [IPAX]  wrote:
> hi!
>
> On 2011-08-19 16:28, Dan Frincu wrote:
>>
>> Hi,
>>
>> On Thu, Aug 18, 2011 at 5:53 PM, Digimer  wrote:
>>>
>>> On 08/18/2011 10:39 AM, Trevor Hemsley wrote:
>>>>
>>>> Hi all
>>>>
>>>> I have attached a first stab at a vim syntax highlighting file for 'crm
>>>> configure edit'
>>>>
>>>> To activate this, I have added 'filetype plugin on' to my /root/.vimrc
>>>> then created /root/.vim/{ftdetect,ftplugin}/pcmk.vim
>>>>
>>>> In /root/.vim/ftdetect/pcmk.vim I have the following content
>>>>
>>>> au BufNewFile,BufRead /tmp/tmp* set filetype=pcmk
>>>>
>>>> but there may be a better way to make this happen. /root/.vim/pcmk.vim
>>>> is the attached file.
>>>>
>>>> Comments (not too nasty please!) welcome.
>>
>> I've added a couple of extra keywords to the file, to cover a couple
>> more use cases. Other than that, great job.
>
> will this addition make it into some package(s)?
> would it be right to ship this vim syntax file with crm?

In the hope it will be a part of crm, I've written a patch for this.
Applying the patch over cibconfig.py and utils.py on Pacemaker 1.1.5
and adding the pcmk.vim file to the vim syntax folder (for Debian
Squeeze it's /usr/share/vim/vim72/syntax) gives access to syntax
highlighting in crm configure edit, if using vi/vim as editor.

Original work on pcmk.vim by Trevor Hemsley ,
a couple of additions by me.

Please review it and and add a Signed-Off line if it's ok.

Regards,
Dan

p.s.: many thanks to everyone for the input received on IRC.

>
> thanks,
> raoul
> --
> 
> DI (FH) Raoul Bhatia M.Sc.          email.          r.bha...@ipax.at
> Technischer Leiter
>
> IPAX - Aloy Bhatia Hava OG          web.          http://www.ipax.at
> Barawitzkagasse 10/2/2/11           email.            off...@ipax.at
> 1190 Wien                           tel.               +43 1 3670030
> FN 277995t HG Wien                  fax.            +43 1 3670030 15
> 
>



-- 
Dan Frincu
CCNA, RHCE
From d3ab2ab159137b271382db8d0edeef6d69325894 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dan=20Fr=C3=AEncu?= 
Date: Tue, 22 Nov 2011 18:50:10 +0200
Subject: [PATCH][BUILD] Low: extra: Add syntax highlighting for crm configure edit
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit


Signed-off-by: Dan Frîncu 
---
 shell/modules/cibconfig.py |1 +
 shell/modules/utils.py |7 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/shell/modules/cibconfig.py b/shell/modules/cibconfig.py
index 9cc9751..49b4b51 100644
--- a/shell/modules/cibconfig.py
+++ b/shell/modules/cibconfig.py
@@ -128,6 +128,7 @@ class CibObjectSet(object):
 except IOError, msg:
 common_err(msg)
 break
+s += "\n# vim: set filetype=.pcmk :\n"
 s = ''.join(f)
 f.close()
 if hash(s) == filehash: # file unchanged
diff --git a/shell/modules/utils.py b/shell/modules/utils.py
index b57aa54..00013c6 100644
--- a/shell/modules/utils.py
+++ b/shell/modules/utils.py
@@ -158,7 +158,7 @@ def str2tmp(s):
 Write the given string to a temporary file. Return the name
 of the file.
 '''
-fd,tmp = mkstemp()
+fd,tmp = mkstemp(suffix=".pcmk")
 try: f = os.fdopen(fd,"w")
 except IOError, msg:
 common_err(msg)
@@ -317,7 +317,10 @@ def edit_file(fname):
 return
 if not user_prefs.editor:
 return
-return ext_cmd("%s %s" % (user_prefs.editor,fname))
+if user_prefs.editor == "vim" or user_prefs.editor == "vi":
+return ext_cmd("%s %s -u /usr/share/vim/vim72/syntax/pcmk.vim" % (user_prefs.editor,fname))
+else:
+return ext_cmd("%s %s" % (user_prefs.editor,fname))
 
 def page_string(s):
 'Write string through a pager.'
-- 
1.7.0.4



pcmk.vim
Description: Binary data
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] killing corosync leaves crmd, stonithd, lrmd, cib and attrd to hog up the cpu

2011-11-14 Thread Dan Frincu
Hi,

On Mon, Nov 14, 2011 at 1:32 PM, ihjaz Mohamed  wrote:
> Hi All,
> As part of some robustness test for my cluster, I tried killing the corosync
> process using kill -9 . After this I see that the pacemakerd service is
> stopped but the processes crmd, stonithd, lrmd, cib and attrd are still
> running and are hogging up the cpu.

I have seen this kind of testing before and I have to say I don't
consider it the recommended way of testing the cluster stack's
"robustness". Pacemaker processes rely on corosync for proper
functioning. You kill corosync and then want to "cleanup" the
processes? You have to go through a lot more literature in order to
understand how this cluster stack works.

For the Master Control Process, how it works and other related
information (which is related to what you are experiencing), see
http://theclusterguy.clusterlabs.org/post/907043024/introducing-the-pacemaker-master-control-process-for

The essential guide you need is
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/

HTH,
Dan

>
> top - 06:26:51 up  2:01,  4 users,  load average: 12.04, 12.01, 11.98
> Tasks: 330 total,  13 running, 317 sleeping,   0 stopped,   0 zombie
> Cpu(s):  7.1%us, 17.1%sy,  0.0%ni, 75.6%id,  0.1%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Mem:   8015444k total,  4804412k used,  3211032k free,    54800k buffers
> Swap: 10256376k total,    0k used, 10256376k free,  1604464k cached
>
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  2053 hacluste  RT   0 90492 3324 2476 R 100.0  0.0 113:40.61 crmd
>  2047 root  RT   0 81480 2108 1712 R 99.8  0.0 113:40.43 stonithd
>  2048 hacluste  RT   0 83404 5260 2992 R 99.8  0.1 113:40.90 cib
>  2050 hacluste  RT   0 85896 2388 1952 R 99.8  0.0 113:40.43 attrd
>  5018 root  20   0 8787m 345m  56m S  2.0  4.4   0:56.95 java
> 19017 root  20   0 15068 1252  796 R  2.0  0.0   0:00.01 top
>     1 root  20   0 19232 1444 1156 S  0.0  0.0   0:01.71 init
>     2 root  20   0 0    0    0 S  0.0  0.0   0:00.00 kthreadd
>     3 root  RT   0 0    0    0 S  0.0  0.0   0:00.00 migration/0
>     4 root  20   0 0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
>
>
> Is there a way to cleanup these processes ? OR Do I need to kill them one by
> one before respawning the corosync?
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] pacemaker compatibility

2011-10-19 Thread Dan Frincu
Hi,

On Wed, Oct 19, 2011 at 8:12 AM,   wrote:
> thank you
> Andreas
>
> Now I am facing core dump issue with
> corosync-1.4.2
> cluster-glue-1.0.7
> pacemaker-1.0.11
>

To report a crash of corosync, please follow this guide =>
http://corosync.org/doku.php?id=faq:crash

Regards,
Dan

> In many scenario I getting core dump during corosync start operation for 2
> rings like.
>
> 1. Configure 2 rings ring0 10.16.16.0,  ring1 192.168.1.0
>   ring1 network(ifconfig  eth1 192.168.1.14 down) is down before corosync
>  startup
>
> 2. Configure 2 rings ring0 10.16.16.0,  ring1 192.168.1.0(invalid network)
>   ring1 network(ifconfig  eth1 193.167.1.14 up) is different. means
> network is not present for ring1. start corosync
>
> All the core dump are generated from the same files.
>
> core was generated by 'corosync' programme terminate with signal 6
> C File where problem encountered
> totemsrp.c.2526
> totemsrp.c.3545
> totemrrp.c.1036
> totemrrp.c.1736
> totemudp.c.1252
> coropoll.c.513
> main.c.1846
>
> With corosync1.2 also we are facing core dump issue.
>
> Is there any way, to avoid only corosync core dump.
>
> On Tue, October 18, 2011 2:05 pm, Andreas Kurz wrote:
>> Hello,
>>
>>
>> On 10/18/2011 08:11 AM, manish.gu...@ionidea.com wrote:
>>
>>> Hi,
>>>
>>>
>>> I am using corosync.1.2.1. I want  to upgrade  corosync  from 1.2 to
>>> 1.4.2.
>>>
>>>
>>>
>>> please can you let me know which version of cluster-glue and pacemekr
>>> are compatiable with corosync1.4.2
>>>
>>> Currentely with corosync1.4.2 I am using pacemaker 1.0.10 and
>>> cluster-glue1.0.3 and I am getting error ..
>>
>> You should also upgrade Pacemaker to 1.0.11 and especially cluster-glue
>> to latest version 1.0.7 ... though this old versions might not be the cause
>> for your problems here.
>>
>>>
>>> service failed to load pacemaker ...
>>
>> Hard to say without having a look at your corosync configuration.
>>
>>
>> Regards,
>> Andreas
>>
>>
>> --
>> Need help with Pacemaker?
>> http://www.hastexo.com/now
>>
>>
>>>
>>>
>>> Regards
>>> Manish
>>>
>>>
>>>
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>  Bugs:
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemak
>>> er
>>
>>
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Resource starts on wrong node ?

2011-09-21 Thread Dan Frincu
Hi,

On Wed, Sep 21, 2011 at 3:03 PM, Hans Lammerts  wrote:
>  Dan,
>
>
>
> Thanks for the swift reply.
>
> I didn't know pacemaker was sort of loadbalancing across nodes.
>
> Maybe I should read the documentation in more detail.
>
>
>
> Regarding the versions:
>
> I would like to have the newest versions, but what I've done until now is
> just install what's available
>
> from the Centos repositories.
>
> Indeed I would like to upgrade since I also sometimes experience the issue
> that several heartbeat daemons
>
> start looping when I change something in the config. Something that's
> supposed to be fixed in a higher level
>
> of corosync/heartbeat/pacemaker
>

Have a look at http://clusterlabs.org/wiki/RHEL as to how to add repos
for EL6. Unfortunately, afaics, only Pacemaker is available as a newer
version, 1.1.5, corosync is still at 1.2.3.

I'd also recommend building corosync RPM's from the tarball
(http://www.corosync.org/) , but that's just my personal preference,
some prefer pre-built binaries.

>
>
> About what you said: Is there a limited number of resources that can run on
> one node, before pacemaker decides it is going to run a subsequent resource
> on another node ?

The algorithm is basically round robin. By default it doesn't make any
assumptions about the "importance" of the resources, first resource
goes to first node, second to second node, third to first node, fourth
to second node, a.s.o., it's round robin, like I said.

>
> Wouldn't it be best to always use the colocation and order directives to
> prevent this from happening ?
>

It all depends on the purpose of the cluster, if it fits the need for
your setup, than yes, use colocation and ordering. There really isn't
a "one size fits all" scenario.

Regards,
Dan

>
>
> Thanks again,
>
>
>
> Hans
>
>
> -Original message-
> To: The Pacemaker cluster resource manager ;
> From: Dan Frincu 
> Sent: Wed 21-09-2011 12:44
> Subject: Re: [Pacemaker] Resource starts on wrong node ?
> Hi,
>
> On Wed, Sep 21, 2011 at 1:02 PM, Hans Lammerts  wrote:
>> Hi all,
>>
>>
>>
>> Just started to configure a two node cluster (Centos 6) with drbd
>> 8.4.0-31.el6,
>>
>> corosync 1.2.3 and pacemaker 1.1.2.
>
> Strange choice of versions, if it's a new setup, why don't you go for
> corosync 1.4.1 and pacemaker 1.1.5?
>
>>
>> I created three DRBD filesystems, and started to add them in the crm
>> config
>> one by one.
>>
>> Everything went OK. After adding these resources they start on node1, and
>> when I set node1
>>
>> in standby, these three DRBD resources failover nicely to the second node.
>> And vice versa.
>>
>> So far so good.
>>
>>
>>
>> Next, I added one extra resource, that is supposed to put an IP alias on
>> eth0.
>>
>> This also works, but strangely enough the alias is set on eth0 of the
>> second
>> node, where I would have
>>
>> expected it to start on the first node (just as the three drbd resources
>> did).
>>
>> Why the does Pacemaker decide that this resource is to be started on
>> the
>> second node ? I cannot grasp
>>
>> the reason why.
>
> Because it tries to load balance resources on available nodes. You
> have several resources running on one node, and didn't specify any
> restrictions on the mysqlip, therefore it chose the second node as it
> had less resources on it. You override the behavior with constraints.
> See below.
>
>>
>> Hope anyone can tell me what I'm doing wrong.
>>
>>
>>
>> Thanks,
>>
>> Hans
>>
>>
>>
>> Just to be sure, I'll show my config below:
>>
>>
>>
>> node cl1 \
>>
>>     attributes standby="off"
>>
>> node cl2 \
>>
>>     attributes standby="off"
>>
>> primitive drbd0 ocf:linbit:drbd \
>>
>>     params drbd_resource="mysql" drbdconf="/etc/drbd.conf" \
>>
>>     op start interval="0" timeout="240s" \
>>
>>     op monitor interval="20s" timeout="20s" \
>>
>>     op stop interval="0" timeout="100s"
>>
>> primitive drbd1 ocf:linbit:drbd \
>>
>>     params drbd_resource="www" drbdconf="/etc/drbd.conf" \
>>
>>     op start interval="0" timeout="240s" \
>>
>>     op monitor interval=&

Re: [Pacemaker] Resource starts on wrong node ?

2011-09-21 Thread Dan Frincu
e"
>
> ms ms_drbd2 drbd2 \
>
>     meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
>
> colocation fs2_on_drbd inf: wwwfs ms_drbd1:Master
>
> colocation fs3_on_drbd inf: zarafafs ms_drbd2:Master
>
> colocation fs_on_drbd inf: mysqlfs ms_drbd0:Master
>
> order fs2_after_drbd inf: ms_drbd1:promote wwwfs:start
>
> order fs3_after_drbd inf: ms_drbd2:promote zarafafs:start
>
> order fs_after_drbd inf: ms_drbd0:promote mysqlfs:start
>

You either set a location constraint for mysqlip or use a colocation
and ordering constraint for it.

e.g.: colocation mysqlip_on_drbd inf: mysqlip ms_drbd0:Master
order mysqlip_after_drbd inf: ms_drbd0:promote mysqlip:start

> property $id="cib-bootstrap-options" \
>
>     dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \
>
>     cluster-infrastructure="openais" \
>
>     expected-quorum-votes="2" \
>
>     no-quorum-policy="ignore" \
>
>     stonith-enabled="false"
>
> rsc_defaults $id="rsc-options" \
>
>     resource_stickyness="INFINITY" \

I wouldn't set INFINITY, it will cause problems, I'd give it a value
of 500 or 1000.

Regards,
Dan

>
>     migration-threshold="1"
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Errors When Loading OCF

2011-09-20 Thread Dan Frincu
Hi,

On Mon, Sep 19, 2011 at 7:21 PM, Nick Khamis  wrote:
> Hello Everyone,
>
> I have been experiencing some problems getting pacemaker going with
> DRBD and MySQL
>
> The Config:
>
> primitive drbd_mysql ocf:linbit:drbd \
>                    params drbd_resource="mysql" \
>                    op monitor interval="15s"
> ms ms_drbd_mysql drbd_mysql \
>                    meta master-max="1" master-node-max="1" \
>                         clone-max="2" clone-node-max="1" \
>                         notify="true"
> primitive fs_mysql ocf:heartbeat:Filesystem \
>                    params device="/dev/drbd/by-res/mysql" \
>                      directory="/var/lib/mysql" fstype="ext3"
> primitive ip_mysql ocf:heartbeat:IPaddr2 \
>                    params ip="192.168.2.100" nic="eth1"
> primitive mysqld lsb:mysqld

I strongly recommend an OCF compliant RA for this (such as
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/mysql),
not the LSB script.

> group mysql fs_mysql ip_mysql mysqld
> colocation mysql_on_drbd \
>                      inf: mysql ms_drbd_mysql:Master
> order mysql_after_drbd \
>                      inf: ms_drbd_mysql:promote mysql:start
> property $id="cib-bootstrap-options" \
>        no-quorum-policy="ignore" \
>        stonith-enabled="false" \
>        expected-quorum-votes="2" \

I'm assuming you're/ve upgrading/upgraded the cluster stack from a
previous version, the dc-version is not the one provided by 1.1.5, as
you've mentioned below.

>        dc-version="1.0.4-2ec1d189f9c23093bf9239a980534b661baf782d" \
>        cluster-infrastructure="openais"
>
> The Errors:
>
> lrmadmin[2302]: 2011/09/19_11:41:26 ERROR:
> lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply
> message of rmetadata with function get_ret_from_msg.
> ERROR: ocf:linbit:drbd: could not parse meta-data:
> ERROR: ocf:linbit:drbd: no such resource agent

Check /usr/lib/ocf/resource.d/linbit directory for the presence of the
drbd RA. If it isn't there, you might have done something wrong while
compiling DRBD.

> lrmadmin[2333]: 2011/09/19_11:41:26 ERROR:
> lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply
> message of rmetadata with function get_ret_from_msg.
> ERROR: lsb:mysqld: could not parse meta-data:
> ERROR: lsb:mysqld: no such resource agent

Could be mysql, not mysqld. Use a RA instead (see above).

> ERROR: object mysqld does not exist
> ERROR: object drbd_mysql does not exist
> ERROR: syntax in primitive: master-max=1 master-node-max=1 clone-max=2
> clone-node-max=1 notify=true
>
>
> The "ERROR: syntax in primitive: master-max=1 master-node-max=1
> clone-max=2 clone-node-max=1 notify=true" could be resolved by adding
> a trailing backslash to:
>

You're missing the RA for DRBD, which means there can be no ms
resource, which you can't reference in a group. The trailing backslash
allowing this seems more of a bug, then a feature.

> group mysql fs_mysql ip_mysql mysqld
>
> The examples found both miss the slash:
>
> http://www.drbd.org/docs/about/    "Adding a DRBD-backed service to
> the cluster configuration"
> http://www.clusterlabs.org/wiki/DRBD_MySQL_HowTo
>

As they should, the trailing backslash within the crm shell means that
it's expecting input on the next line (not having input on the next
line should result in an error, therefore my mention of this possibly
being a bug).

> Environemnt:
> DRBD and Cluster Stack are all the latest versions downloaded and
> built from source.
> DRBD: version: 8.3.7
> CRM: 1.1.6

You mean Pacemaker 1.1.6.

>
> DRBD Meta Data: /dev/drbd0/by-res/r0.res
> OCF RA: /usr/lib/ocf/resource.d/linbit/drbd
> MySQL RA: /usr/lib/ocf/resource.d/heartbeat/mysql?
> /etc/init.d/mysql starts fine...

No doubt there.

>
> I  just noticed "dc-version", should this match "Version:
> 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" returned by crm?
>

Yes

> Finally, where is the best source of up-to-date documenation for
> Cluster Glue and Resource Agents.

Documentation regarding ... installation, configuration, etc? Here's a
couple of useful links.

Resource agents
http://www.linux-ha.org/wiki/Resource_Agents
http://linux-ha.org/wiki/OCF_Resource_Agents

Dev guide for OCF RA's
http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html

Resource agents repo
https://github.com/ClusterLabs/resource-agents

Cluster glue repo
http://hg.linux-ha.org/glue/

HTH,
Dan

>
> Thanks in Advnace,
>
> Nick.
>
> 

Re: [Pacemaker] Compile Error on Debian

2011-09-16 Thread Dan Frincu
Hi,

On Fri, Sep 16, 2011 at 10:56 AM, Dejan Muhamedagic  wrote:
> Hi,
>
> On Thu, Sep 15, 2011 at 06:06:31PM -0400, Nick Khamis wrote:
>> Hello Everyone,
>>
>> Using tip 1.0.7 I get:
>>
>> pes -Wwrite-strings -ansi -D_GNU_SOURCE -DANSI_ONLY -Werror -MT
>> pils.lo -MD -MP -MF .deps/pils.Tpo -c pils.c  -fPIC -DPIC -o
>> .libs/pils.o
>> cc1: warnings being treated as errors
>> In file included from /usr/include/glib-2.0/glib/gasyncqueue.h:34,
>>                  from /usr/include/glib-2.0/glib.h:34,
>>                  from pils.c:34:
>> /usr/include/glib-2.0/glib/gthread.h: In function ���g_once_init_enter���:
>> /usr/include/glib-2.0/glib/gthread.h:348: error: cast discards
>> qualifiers from pointer target type
>
Apply the following patch fixes it.
http://bugzilla-attachments.gnome.org/attachment.cgi?id=158740

Regards,
Dan

> This seems to be an issue in glib-2.0. You can also
> configure with enable_fatal_warnings=no.
>
> Thanks,
>
> Dejan
>
>> make[2]: *** [pils.lo] Error 1
>> make[2]: Leaving directory
>> `/usr/local/src/Reusable-Cluster-Components-glue--glue-1.0.7/lib/pils'
>> make[1]: *** [all-recursive] Error 1
>> make[1]: Leaving directory
>> `/usr/local/src/Reusable-Cluster-Components-glue--glue-1.0.7/lib'
>> make: *** [all-recursive] Error 1
>> root@pace1:/usr/local/src/Reusable-Cluster-Components-glue--glue-1.0.7#
>> apt-get install pils
>> Reading package lists... Done
>> Building dependency tree
>> Reading state information... Done
>> E: Unable to locate package pils
>> root@pace1:/usr/local/src/Reusable-Cluster-Components-glue--glue-1.0.7#
>> apt-get install libpils
>> Reading package lists... Done
>> Building dependency tree
>> Reading state information... Done
>> E: Unable to locate package libpils
>>
>>
>> Thanks in Advance,
>>
>> Nick
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: 
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
> _______
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Syntax highlighting in vim for crm configure edit

2011-08-19 Thread Dan Frincu
Hi,

On Thu, Aug 18, 2011 at 5:53 PM, Digimer  wrote:
> On 08/18/2011 10:39 AM, Trevor Hemsley wrote:
>> Hi all
>>
>> I have attached a first stab at a vim syntax highlighting file for 'crm
>> configure edit'
>>
>> To activate this, I have added 'filetype plugin on' to my /root/.vimrc
>> then created /root/.vim/{ftdetect,ftplugin}/pcmk.vim
>>
>> In /root/.vim/ftdetect/pcmk.vim I have the following content
>>
>> au BufNewFile,BufRead /tmp/tmp* set filetype=pcmk
>>
>> but there may be a better way to make this happen. /root/.vim/pcmk.vim
>> is the attached file.
>>
>> Comments (not too nasty please!) welcome.

I've added a couple of extra keywords to the file, to cover a couple
more use cases. Other than that, great job.

Regards,
Dan

>
> I would love to see proper support added for CRM syntax highlighting
> added to vim. I will give this is a test and write back in a bit.
>
> --
> Digimer
> E-Mail:              digi...@alteeve.com
> Freenode handle:     digimer
> Papers and Projects: http://alteeve.com
> Node Assassin:       http://nodeassassin.org
> "At what point did we forget that the Space Shuttle was, essentially,
> a program that strapped human beings to an explosion and tried to stab
> through the sky with fire and math?"
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Dan Frincu
CCNA, RHCE


pcmk.vim
Description: Binary data
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] ocf:heartbeat:Filesystem doesn't work via corosync

2011-08-18 Thread Dan Frincu
output: (fs_mysql:start:stdout) 
> Disk write-protected; use the -n option to do a read-only#012check of the 
> device.
> Aug 17 12:35:20 gila lrmd: [24754]: info: RA output: (fs_mysql:start:stderr) 
> fsck.ext4: Read-only file system while trying to open /dev/drbd0#015
>
>
> Any help would be greatly appreciated.
>
> Thanks,
> Cotton Tenney
> Systems Administrator
> Rogers Software Development
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Ordering and Colocation

2011-08-15 Thread Dan Frincu
Hi,

On Mon, Aug 15, 2011 at 3:33 AM, Curtis  wrote:
> On 15/08/11 10:23, Curtis wrote:
>>
>> Greetings,
>> I've been wrestling with this configuration for a few days now as I
>> slowly climb the learning curve of Pacemaker.
>
>
> Further details [sorry]-- versions.  All are from debian squeeze
>
> Pacemaker: 1.0.9
> Corosync: 1.2.1
> Cluster Glue: 1.0.6
>
>> My situation is as follows:
>>
>> I have 2 nodes, with 3 layers of resources:
>> drbd->lvm->publish
>>
>> They must run on both nodes, but each service is dependant only on those
>> on the same node.
>>
>> If drbd is not Master, lvm can't start.
>> If lvm isn't started, publish can't start.
>>
>> Now, from talking with beekhof on IRC, All I need is ordering and
>> colocation. This has worked for bringing it up, but when I, say, stop
>> LVM... the publishing doesn't stop.

How do you stop LVM? On what node?
Are you running DRBD dual-primary by any chance?
STONITH configured? And enabled? And tested?

Regards,
Dan

>>
>> Config [sorry if XML is preferred]:
>>
>> primitive drbd_prim ocf:linbit:drbd \
>> params drbd_resource="raid"
>> primitive lvm_prim ocf:heartbeat:LVM \
>> params volgroupname="raid"
>> primitive publish_prim ocf:iomax:scst \
>> prams 
>> ms drbd drbd_prim \
>> meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="2"
>> notify="true"
>> clone lvm lvm_prim \
>> meta globally-unique="true" clone-max="2" clone-node-max="1"
>> clone publish publish_prim \
>> meta globally-unique="true" clone-max="2" clone-node-max="1"
>> colocation lvm_with_drbd inf: drbd:Master lvm
>> colocation publish_with_lvm inf: lvm publish
>> order drbd_then_lvm inf: drbd:promote lvm symmetrical=true
>> order lvm_then_publish inf: lvm publish symmetrical=true
>>
>> I'd really appreciate any information on how my understanding is
>> deficient, and how to get this working.
>>
>> --
>> Curtis
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Question about Pacemaker master/slave and mysql replication

2011-08-15 Thread Dan Frincu
Hi,

On Sat, Aug 13, 2011 at 2:53 AM, Michael Szilagyi  wrote:
> I'm new to Pacemaker and trying to understand exactly what it can and can't
> do.
> I currently have a small, mysql master/slave cluster setup that is getting
> monitored within Heartbeat/Pacemaker:  What I'd like to be able to do (and
> am hoping Pacemaker will do) is to have 1 node designated as Master and in
> the event of a failure, automatically promote a slave to master and realign
> all of the existing slaves to be slaves of the newly promoted master.
>  Currently what seems to be happening, however, is heartbeat correctly sees
> that a node goes down and pacemaker promotes it up to master but the
> replication is not adjusted so that it is now feeding everyone else.  It
> seems like this should be possible to do from within Pacemaker but I feel
> like I'm missing a part of the puzzle.  Any suggestions would be
> appreciated.

You could try the mysql RA => from
https://github.com/fghaas/resource-agents/blob/master/heartbeat/mysql
Last I heard, it had replication support.

HTH.

>
> Here's an output of my crm configure show:
> node $id="7deca2cd-9a64-476c-8ea2-372bca859a4f" four \
> attributes 172.17.0.130-log-file-p_sql="mysql-bin.13"
> 172.17.0.130-log-pos-p_sql="632"
> node $id="9b355ab7-8c81-485c-8dcd-1facedde5d03" three \
> attributes 172.17.0.131-log-file-p_sql="mysql-bin.20"
> 172.17.0.131-log-pos-p_sql="106"
> primitive p_sql ocf:heartbeat:mysql \
> params config="/etc/mysql/my.cnf" binary="/usr/bin/mysqld_safe"
> datadir="/var/lib/mysql" \
> params pid="/var/lib/mysql/novaSQL.pid" socket="/var/run/mysqld/mysqld.sock"
> \
> params max_slave_lag="120" \
> params replication_user="novaSlave" replication_passwd="nova" \
> params additional_parameters="--skip-external-locking
> --relay-log=novaSQL-relay-bin --relay-log-index=relay-bin.index
> --relay-log-info-file=relay-bin.info" \
> op start interval="0" timeout="120" \
> op stop interval="0" timeout="120" \
> op promote interval="0" timeout="120" \
> op demote interval="0" timeout="120" \
> op monitor interval="10" role="Master" timeout="30" \
> op monitor interval="30" role="Slave" timeout="30"
> primitive p_sqlIP ocf:heartbeat:IPaddr2 \
> params ip="172.17.0.96" \
> op monitor interval="10s"
> ms ms_sql p_sql \
> meta target-role="Started" is-managed="true"
> location l_sqlMaster p_sqlIP 10: three
> location l_sqlSlave1 p_sqlIP 5: four
> property $id="cib-bootstrap-options" \
> dc-version="1.0.9-unknown" \
> cluster-infrastructure="Heartbeat" \
> stonith-enabled="false" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1313187103"
>
> Thanks!
> -Mike.
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Split-brain after

2011-08-15 Thread Dan Frincu
On Thu, Aug 11, 2011 at 8:12 PM, Digimer  wrote:
> On 08/11/2011 12:58 PM, Alex Forster wrote:
>> I have a two node Pacemaker/Corosync cluster with no resources configured 
>> yet.
>> I'm running RHEL 6.1 with the official 1.1.5-5.el6 package.
>>
>> While doing various network configuration, I happened to notice that if I 
>> issue
>> a "service network restart" on one node, then approx. four seconds later 
>> issue
>> "service network restart" on the second node, the two nodes become split 
>> brain,
>> each thinking the other is offline.
>>
>> Obviously, issuing 'service network restarts' four seconds apart will not be 
>> a
>> common occurrence in production, but it concerns me that I can 'trick' the 
>> nodes
>> into becoming split-brain so easily. Is there some way I can configure 
>> Corosync
>> to quickly recover from this scenario?

man corosync.conf
You can increase the value for rrp_problem_count_timeout for this.

rrp_problem_count_timeout
  This specifies the time in milliseconds to wait before
decrementing the problem count by 1 for a particular ring to ensure a
link is not marked faulty for tran‐
  sient network failures.

  The default is 2000 milliseconds.

This, however, will cause issues further along the way so you need to
take into consideration the timeouts that resources will have, as well
as monitor operations as to include the added time from modifying this
value.

Regards,
Dan

p.s.: don't mess with rrp_problem_count_threshold unless you also
consider that (rrp_problem_count_threshold *
rrp_token_expired_timeout) < (token - 50ms) => (10 * 47) < (1000 - 50)
=> 470 < 950 (this is the default, changing
rrp_problem_count_threshold to a higher value would also mean changing
the token timeout and/or other parameters, so it would be best to plan
ahead).

>>
>> Alex
>
> Configuring fence (stonith) will protect against split-brain by causing
> the remote node to be forced offline (rough, but better than split-brain).
>
> --
> Digimer
> E-Mail:              digi...@alteeve.com
> Freenode handle:     digimer
> Papers and Projects: http://alteeve.com
> Node Assassin:       http://nodeassassin.org
> "At what point did we forget that the Space Shuttle was, essentially,
> a program that strapped human beings to an explosion and tried to stab
> through the sky with fire and math?"
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Hostname issues

2011-06-21 Thread Dan Frincu
Hi,

On Tue, Jun 21, 2011 at 5:12 PM, Proskurin Kirill
wrote:

> Hello all
>
> I have 4 nodes - all of them with two nic in two network. All of them have
> 2 DNS name - one for internal network and one for external.
>
> This host *must* have a hostname of external network(for other software to
> work). Corosync must works on internal nic.
>
> But it is ask for uname -n for node name and get external name.
>
> How to avoide this? I can`t change hostname to int one and can`t run
> corosync on ext network.
>

Corosync runs on the IP addresses that you configure in
/etc/corosync/corosync.conf.

Pacemaker requires unique hostnames (as they are retrieved via uname -n) to
function properly.

If they are "internal" or "external" it really doesn't matter for the
cluster stack, that's a human point of view, the machine has no real opinion
in this (e.g.: if you configure IP communication channels properly, then
corosync will use the internal network, but the hostnames will be set to the
"external" hostname as these two are unrelated).

Regards


>
> --
> Best regards,
> Proskurin Kirill
>
> __**_
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/**mailman/listinfo/pacemaker<http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>
> Project Home: http://www.clusterlabs.org
> Getting started: 
> http://www.clusterlabs.org/**doc/Cluster_from_Scratch.pdf<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://developerbugs.linux-**foundation.org/enter_bug.cgi?**
> product=Pacemaker<http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker>
>



-- 
Dan Frincu
CCNA, RHCE
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] How to use the crm command to modify the heartbeat / corosync the interval / timeout value?

2011-06-08 Thread Dan Frincu
Hi,

2011/6/8 飞爱曦 

>  How to use the crm command to modify the heartbeat / corosync the interval
> / timeout value?
> For example:
> primitive Filesystem_3 ocf: heartbeat: Filesystem \
> op monitor interval = "120s" timeout = "60s" \
> params device = "-U56c48cba-c365-40fc-8895-d85916755f28" directory = "/
> d7" fstype = "ext3"
> I want to modify the
> op monitor interval = "120s" timeout = "60s"
>  as
>  op monitor interval = "60s" timeout = "30s"
>  how to use the crm command changes?
> I am looking for a long time did not find, for help ~
>
> Is there a way to without "crm configure edit"? I wrote an web interface to
> the detection and controlheartbeat, I need a command to modify the way ...
>
> crm configure save > filename
Edit filename as to change the values as you need and then
crm configure load replace filename

>
> Poor zhou =.=
>
>
>
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>


-- 
Dan Frincu
CCNA, RHCE
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Preventing auto-fail-back

2011-05-18 Thread Dan Frincu
>
> Thanks!
>
> Daniel Bozeman
>
>
>
> Daniel Bozeman
> American Roamer
> Systems Administrator
> daniel.boze...@americanroamer.com
>
>
>
> ________
> In order to protect our email recipients, Betfair Group use SkyScan from
> MessageLabs to scan all Incoming and Outgoing mail for viruses.
>
> 
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>


-- 
Dan Frincu
CCNA, RHCE
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Very strange behavior on asymmetric cluster

2011-03-19 Thread Dan Frincu
a functional piece of software that cover most
use cases is usually forgotten, because hey, you spent all of those hard
hours making sure the software works, implementing feature after feature,
with the most commonly-used ones as a main target, cross-testing various
hardware platforms, operating systems and writing endless pages of
documentation so that the community will benefit from the first open source
cluster stack and associated software that can compete head-to-head with
pricey commercial clustering solutions on the market, but since your
software doesn't know how to cook cordon-bleu, it's not worth considering
for a long term relationship, it's better to part ways now because it just
won't work between the two of you.

Clearly, many more rants will probably follow this posting, but I'm ok with
that, everyone's got the right to express themselves, whether right or
wrong, which is a relative subject anyway, and please forgive me if I did
step on any toes, it was not my intention.

Regards,
Dan

-- 
Dan Frincu
CCNA, RHCE
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] how can i define service so that switch all other services to onother node if one service failed?

2011-02-28 Thread Dan Frincu

Hi,

On 02/28/2011 02:00 PM, Thomas Elsäßer wrote:

Dear all,

i have read the clusterbook, but i have not understand how can i
configure example three services so that if one service failed, all
three services switch to another node.

Example:

primitive ServiceAmavis lsb:amavis op monitor interval="10" timeout="20s"
primitive ServiceApache lsb:apache2 op monitor interval="10"
primitive ServiceCyrus lsb:cyrus2.2  op monitor interval="10" timeout="20s"

what must i configure if i kill example apache process, that switch all
three to another node?
You need to configure groups so that all resources run together or not 
at all

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch-advanced-resources.html#group-resources
But also take a look at ordering
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-ordering.html#id2003768
and colocation
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-colocation.html#id1993383
And also check whether the init scripts are LSB compliant
http://refspecs.linux-foundation.org/LSB_3.2.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
And for apache please use the ocf:heartbeat:apache resource agent
http://linux-ha.org/doc/man-pages/re-ra-apache.html

HTH,
Dan

Thanks
Thomas



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


--
Dan Frincu
CCNA, RHCE


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] stickiness weirdness please explain

2011-02-24 Thread Dan Frincu

Hi,

On 02/23/2011 06:19 PM, Jelle de Jong wrote:

Dear Dan,

Thank you for taking the time to read and answer my question.

On 23-02-11 09:42, Dan Frincu wrote:

This is something that you should remove from the config, as I
understand it, all resources should run together on the same node and
migrate together to the other node.

1.
   location cli-prefer-ip_virtual01 ip_virtual01 \
2.
   rule $id="cli-prefer-rule-ip_virtual01" inf: #uname eq finley
3.
   location cli-prefer-iscsi02_lun1 iscsi02_lun1 \
4.
   rule $id="cli-prefer-rule-iscsi02_lun1" inf: #uname eq godfrey
5.
   location cli-prefer-iscsi02_target iscsi02_target \
6.
   rule $id="cli-prefer-rule-iscsi02_target" inf: #uname eq
   finley

I am sorry, I don’t know what I should do with these 6 rules?

After you put a node in standby, if it's the active node it will migrate 
the resources to the passive node and make that one active. However you 
must remember to issue the command crm node online $nodename otherwise 
the node will not be allowed to run resources on it. Just as a side note.

This simplifies resource design and thus keeping the cib smaller, while
achieving the same functional goal.

Output of ptest -LsVVV and some logs in a pastebin might help.

I changed my configuration according to your comments and the standby
and reboot of both nodes seems to works fine now! Thank you!

http://debian.pastebin.com/LuUGkRLd<  configuration and ptest output

However I still have the problem that I cant seem to move the resources
between nodes with the crm resource move command.
The way I used the crm move command was not to specify the node name. I 
can't remember now why I did that (probably because I also used it on a 
2-node cluster), but the logic was use crm move groupname, and it will 
create a location constraint preventing the resources from the group 
from running on the node that's currently primary. After the migration 
of the resources has occured, in order to remove the location constraint 
(e.g.: allow the resources to move back if necessary) you must either 
remove the location constraint from the cib or use crm unmove groupname, 
I used the unmove command.


Just to be clear:

1. resources on finley ==> crm resource move ==> resources move to 
godfrey ==> crm resource unmove ==> resources remain on godfrey (we've 
just removed the constraint, but the resource stickiness prevents the 
ping-pong effect)
2. resources on godfrey ==> crm resource move ==> resources move to 
finley ==> crm resource unmove ==> resources remain on finley (same as 1 
but from a different view)


Things to be aware of:

1. resources on a node ==> crm resource move ==> before the resources 
finish migrating you issue crm resources unmove ==> the resources don't 
finish migrating to the other node and come back to the original node 
(so don't get finger happy on the keyboard, give the resources time to 
move).
2. resources on finley ==> crm resource move ==> resources move to 
godfrey ==> godfrey crashes ==> resources don't migrate to finley 
(because the crm resource unmove command was not issues, so the location 
constraint preventing the resources from running on finley is still in 
place, even if finley is the last node in the cluster) ==> crm resource 
unmove ==> resources start on finley


One thing to test would be to first remove any config that looks like this
location cli-prefer-rg_iscsi rg_iscsi \
rule $id="cli-prefer-rule-rg_iscsi" inf: #uname eq finley
With reference either to finley or to godfrey. Reboot both nodes, let 
them start and settle on a location, do a crm configure save 
initial.config. Issue the crm resource move (let them migrate), then crm 
configure save migrated.config, then crm resource unmove, then crm 
configure save unmigrated.config, and compare the results. This way 
you'll see how the setup looks and what rules are added and removed 
during the process.


If the move command somehow doesn't work, you might want to take a look 
if you've configured resource level fencing for DRBD, 
http://www.drbd.org/users-guide/s-pacemaker-fencing.html
The fence peer handler will add a constraint in some cases (such as when 
you put a node in standby) preventing the DRBD resource to run. When you 
bring a node online, and there have been disk changes and DRBD has to 
sync some data, until the data is synced the constraint is still there, 
so issuing a crm resource move while DRBD is syncing won't have the 
expected outcome (again the reference to being finger happy on the 
keyboard). After the sync is done, the crm-unfence-peer.sh removes the 
constraint, then the move command will work.


Just a couple of things to keep in mind.

HTH,
Dan


Would you be willing to take a

Re: [Pacemaker] stickiness weirdness please explain

2011-02-23 Thread Dan Frincu
Hi,

On Tue, Feb 22, 2011 at 7:21 PM, Jelle de Jong wrote:

> Hello everybody,
>
> I got the following setup: http://debian.pastebin.com/Sife0hTz
>
> The problem is that when I crm node standby the godfrey node2 everything
> nicely migrates to finley node1 and continues to run. (as expected) when
> godfrey comes back online and finished synchronising the drbd disks it
> tries to take over the resources of finley and fails crashing the iscsi
> and drbd systems
>

This is something that you should remove from the config, as I understand
it, all resources should run together on the same node and migrate together
to the other node.

   1. location cli-prefer-ip_virtual01 ip_virtual01 \
   2. rule $id="cli-prefer-rule-ip_virtual01" inf: #uname eq finley
   3. location cli-prefer-iscsi02_lun1 iscsi02_lun1 \
   4. rule $id="cli-prefer-rule-iscsi02_lun1" inf: #uname eq godfrey
   5. location cli-prefer-iscsi02_target iscsi02_target \
   6. rule $id="cli-prefer-rule-iscsi02_target" inf: #uname eq
   finley


Try removing property default-resource-stickiness="200" and adding a section
with:
rsc_defaults $id="rsc-options" \
resource-stickiness="200"

And also maybe increasing the value to 1000 from 200.

I see that both groups rg_iscsi01 and rg_iscsi02 start on the same node, and
the general order would be promote DRBD, start the virtual IP, start the
targets and then the luns. I would suggest:
group rg_iscsi ip_virtual01 iscsi01_target iscsi01_lun1 iscsi02_target
iscsi02_lun1 iscsi02_lun2 iscsi02_lun3 iscsi02_lun4
(instead of the 2 groups)
All colocation drbd_rx-master-with-ip inf: ms_drbd_rx:Master ip_virtual01
lines removed
All colocation iscsi0x-with-drbd-master inf: rg_iscsi01 ms_drbd_rx:Master
lines changed to
colocation iscsi0x-with-drbd-master inf: rg_iscsi ms_drbd_rx:Master
(replacing x with the appropriate values)
All order ip-after-drbd_rx inf: ms_drbd_rx:promote ip_virtual01:start lines
removed
All order iscsi0x-after-drbd-promote inf: ms_drbd_rx:promote
rg_iscsi0x:start lines changed to
order iscsi0x-after-drbd-promote inf: ms_drbd_rx:promote rg_iscsi:start
(replacing x ...)

This simplifies resource design and thus keeping the cib smaller, while
achieving the same functional goal.


> I have to stop corosync on both nodes and start them again make both
> nodes standby and then online godfrey and then online finley to get it
> all working again.
>
> Why doesn't it stay running on the finley node when godfrey comes back
> online?
>
> I am also unable to move the iscsi luns with all his depending resources
> to finley and back and forwards by using crm resource move finely.
>
> Why can't I manually move resources around?
>
>
Output of ptest -LsVVV and some logs in a pastebin might help.

Regards,
Dan


> There is probably something I am not doing right but please help me out
> i read the Cluster_from_Scratch.pdf,
> Pacemaker-1.0-Pacemaker_Explained-en-US and ha-iscsi.pdf.
>
> Thanks in advance,
>
> With kind regards,
>
> Jelle de Jong
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Dan Frincu
CCNA, RHCE
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] First confused (then enlightened ? :)

2011-02-15 Thread Dan Frincu
On Wed, Feb 16, 2011 at 9:32 AM, Dan Frincu  wrote:

> On Tue, Feb 15, 2011 at 1:02 PM, Carlos G Mendioroz wrote:
>
>> Andrew Beekhof @ 15/02/2011 04:25 -0300 dixit:
>>
>>  For what I understand, you want the brains of the action at pacemaker,
>>>> so VRRP, HSRP or (U)CARP seem more a trouble than a solution.
>>>> (i.e. twin head) right ?
>>>>
>>>> In other words, it seems to better align with the solution idea to
>>>> have pacemaker decide and some script-set do the changing.
>>>>
>>>
>>> What you typically want to avoid is having two isolated entities
>>> trying to make decisions in the cluster - pulling it to pieces in the
>>> process.
>>>
>> Right, makes a lot of sense, only one boss in the office and one place
>> to define policy.
>> But to integrate with other protocols thought as independent, like VRRP
>> or (U)CARP, the "dependency" has to be implemented.
>>
>>
>>  Something like DRBD solves this by using crm_master to tell Pacemaker
>>> which instance it would like promoted, but not actually doing the
>>> promotion itself.
>>>
>>> I don't know if this is feasible for your application.
>>>
>>>  In my case, it seems better to get rid of VRRP and use a more
>> comprehensive look of pacemaker.
>>
>>
>>  Nevertheless, I don't see the concerns of MAC mutation being addressed
>>>> anywhere. And I have my suspocious at ARP caches too.
>>>>
>>>
>>> Both would be properties of the RA itself rather than Pacemaker or
>>> Heartbeat.
>>> So if you can script MAC mutation, you can also create an RA for it
>>> (or add it to an existing one).
>>>
>>
>> Is there a "guide to implemented RAs" ?
>> I've seen that the shell can list them. Are they embedded or just
>> showing a directory of entities found in some predefined places ?
>
>
> http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html - The OCF
> Resource Agent Developer's Guide
> http://www.linux-ha.org/wiki/Resource_Agents - Resource Agents
> http://www.linux-ha.org/wiki/OCF_Resource_Agents - OCF Resource Agents
>
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html#ap-ocf
>  -
> OCF Resource Agents
>
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Clusters_from_Scratch/index.html#id2281146
>  -
> Listing Resource Agents
>
>
And not to forget

http://www.linux-ha.org/doc/users-guide/users-guide.html - The Linux-HA
User’s Guide
and
http://www.linux-ha.org/doc/man-pages/man-pages.html - Linux-HA Manual Pages


> HTH
>
>
>>
>>
>>  I'm currently thinking about a couple of ideas:
>>>> -using mac-vlan to move an active mac from one server to another
>>>> -using bonding to have something like a MEC, multichasis ether channel.
>>>> (i.e. a way to not only migrate the MAC but also to signal the migration
>>>> to the attachment switch using 802.1ad)
>>>>
>>>> Are there any statistics on how much time does it take to migrate
>>>> an IP address by current resource ? (IPAddr2 I guess)
>>>> I'm looking for a subsecond delay since failure detection,
>>>> and I guess it's obvious, an active-standby setup.
>>>>
>>>
>>> I've not done any measurements lately.
>>> Mostly its dependent on how long the RA takes.
>>>
>>
>> Ok, now I'm getting into RA arena I guess.
>> For speedy failover, I would need a hot standby approach. Is that a
>> pacemaker known state ?
>>
>> --
>> Carlos G MendiorozLW7 EQI  Argentina
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs:
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>
>
>
> --
> Dan Frincu
> CCNA, RHCE
>
>


-- 
Dan Frincu
CCNA, RHCE
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] First confused (then enlightened ? :)

2011-02-15 Thread Dan Frincu
On Tue, Feb 15, 2011 at 1:02 PM, Carlos G Mendioroz wrote:

> Andrew Beekhof @ 15/02/2011 04:25 -0300 dixit:
>
>  For what I understand, you want the brains of the action at pacemaker,
>>> so VRRP, HSRP or (U)CARP seem more a trouble than a solution.
>>> (i.e. twin head) right ?
>>>
>>> In other words, it seems to better align with the solution idea to
>>> have pacemaker decide and some script-set do the changing.
>>>
>>
>> What you typically want to avoid is having two isolated entities
>> trying to make decisions in the cluster - pulling it to pieces in the
>> process.
>>
> Right, makes a lot of sense, only one boss in the office and one place
> to define policy.
> But to integrate with other protocols thought as independent, like VRRP
> or (U)CARP, the "dependency" has to be implemented.
>
>
>  Something like DRBD solves this by using crm_master to tell Pacemaker
>> which instance it would like promoted, but not actually doing the
>> promotion itself.
>>
>> I don't know if this is feasible for your application.
>>
>>  In my case, it seems better to get rid of VRRP and use a more
> comprehensive look of pacemaker.
>
>
>  Nevertheless, I don't see the concerns of MAC mutation being addressed
>>> anywhere. And I have my suspocious at ARP caches too.
>>>
>>
>> Both would be properties of the RA itself rather than Pacemaker or
>> Heartbeat.
>> So if you can script MAC mutation, you can also create an RA for it
>> (or add it to an existing one).
>>
>
> Is there a "guide to implemented RAs" ?
> I've seen that the shell can list them. Are they embedded or just
> showing a directory of entities found in some predefined places ?


http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html - The OCF Resource
Agent Developer's Guide
http://www.linux-ha.org/wiki/Resource_Agents - Resource Agents
http://www.linux-ha.org/wiki/OCF_Resource_Agents - OCF Resource Agents
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html#ap-ocf
-
OCF Resource Agents
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Clusters_from_Scratch/index.html#id2281146
-
Listing Resource Agents

HTH


>
>
>  I'm currently thinking about a couple of ideas:
>>> -using mac-vlan to move an active mac from one server to another
>>> -using bonding to have something like a MEC, multichasis ether channel.
>>> (i.e. a way to not only migrate the MAC but also to signal the migration
>>> to the attachment switch using 802.1ad)
>>>
>>> Are there any statistics on how much time does it take to migrate
>>> an IP address by current resource ? (IPAddr2 I guess)
>>> I'm looking for a subsecond delay since failure detection,
>>> and I guess it's obvious, an active-standby setup.
>>>
>>
>> I've not done any measurements lately.
>> Mostly its dependent on how long the RA takes.
>>
>
> Ok, now I'm getting into RA arena I guess.
> For speedy failover, I would need a hot standby approach. Is that a
> pacemaker known state ?
>
> --
> Carlos G MendiorozLW7 EQI  Argentina
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Dan Frincu
CCNA, RHCE
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] First confused (then enlightened ? :)

2011-02-15 Thread Dan Frincu
Hi,



Is there a searchable repository of the list content so I may find
>>> if some of my doubts are already explained ?
>>>
>>
> Answering myself, I found that this (and some related lists) are archived
> and indexed at GossamerThreads,
>http://www.gossamer-threads.com/lists/linuxha
> I usually find that indexing a list like this is an invaluable tool, so
> here for the record.
>

For future reference, maybe this method will help someone else. From
http://www.clusterlabs.org/wiki/Mailing_lists there are 3 main archives:
- http://oss.clusterlabs.org/pipermail/pacemaker
- http://lists.linux-ha.org/pipermail/linux-ha
- http://lists.linux-foundation.org/pipermail/openais
+ 1 for drbd
- http://lists.linbit.com/pipermail/drbd-user/

What I do is to take the gzipped archives from all of the above, extract
them as text and index them with google desktop for quick reference.

Here's the one liner to do that:

for i in http://oss.clusterlabs.org/pipermail/pacemaker
http://lists.linux-ha.org/pipermail/linux-ha
http://lists.linux-foundation.org/pipermail/openais
http://lists.linbit.com/pipermail/drbd-user/ ; do mkdir -p $(pwd)/${i##*/}
&& for j in $(wget $i -O - 2>/dev/null | awk -F '"' -v var="$i" '/.gz/
{print var"/"$2}') ; do wget $j -P $(pwd)/${i##*/} 2>/dev/null; done &&
gunzip $(pwd)/${i##*/}/*.gz 2>/dev/null; done

Regards,
Dan

-- 
Dan Frincu
CCNA, RHCE
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] First confused (then enlightened ? :)

2011-02-14 Thread Dan Frincu
Hi,

On Mon, Feb 14, 2011 at 10:44 AM, Andrew Beekhof  wrote:

> On Sun, Feb 13, 2011 at 12:56 PM, Carlos G Mendioroz 
> wrote:
> > Hi,
> > I'm trying to understand how to best implement a HA EE service.
> > Searching for linux + HA induced me to go linux-HA way (heartbeat?)
> > and that showed the path to pacemaker, and that open the way to corosync
> > and OpenAIS. No news for you I guess.
> >
> > Is there a searchable repository of the list content so I may find
> > if some of my doubts are already explained ?
> >
> > I have otherwise some specific questions:
> >
> > -Is still the case that Heartbeat is not to be considered for new
> > deployments ? (I read something along that line)
>
> pretty much
>
>
> http://www.clusterlabs.org/wiki/FAQ#Should_I_Run_Pacemaker_on_Heartbeat_or_Coroysnc.3F
>
> > -All IBM docs (RedBooks) point to Heartbeat. Is that because they are old
> ?
>
> yes
>

They may be old, but they're catching up =>
http://www.ibm.com/developerworks/cloud/library/cl-highavailabilitycloud/

Regards,
Dan


>
> > -I'm looking to implement Active + hot standby, and given my present
> > state of understanding, I was targetting something like VRRP + cluster
> > aware service. As I have a networking background, implementing VRRP
> > sounds sweet, but not migrating the MAC is kind of bitter.
> >
> > Is it ok to discuss such details here ?
>
> sure
>
> also have a look at clusters from scratch:
>http://www.clusterlabs.org/doc
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Dan Frincu
CCNA, RHCE
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] ping directive configuration

2011-02-09 Thread Dan Frincu
>>> > > cause=C_IPC_MESSAGE origin=handle_response ]
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message:
>>> > > Transition 59: PEngine Input stored in:
>>> /var/lib/pengine/pe-input-82.bz2
>>> > > Feb  1 09:01:06 crhnode2 crmd: [3742]: info: unpack_graph: Unpacked
>>> > > transition 59: 14 actions in 14 synapses
>>> > > Feb  1 09:01:06 crhnode2 pengine: [4103]: info: process_pe_message:
>>> > > Configuration ERRORs found during PE processing.  Please run
>>> "crm_verify
>>> > > -L" to identify issues.
>>> > >
>>> > >
>>> > >
>>> > > here is my current configuration
>>> > >
>>> > >
>>> > > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \
>>> > > attributes standby="off"
>>> > > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \
>>> > > attributes standby="off"
>>> > > primitive crhweb ocf:heartbeat:apache \
>>> > >
>>> > > params configfile="/etc/httpd/conf/httpd.conf" \
>>> > > op monitor interval="60s" \
>>> > > meta target-role="Started"
>>> > > primitive failoverip ocf:heartbeat:IPaddr \
>>> > > params ip="10.100.1.100" cidr_netmask="255.255.0.0" \
>>> > > op monitor interval="30s" \
>>> > > meta target-role="Started"
>>> > > primitive pingd ocf:pacemaker:pingd \
>>> > > params dampen="5s" host_list="10.100.0.254" multiplier="1000"
>>> > > name="pingval" \
>>> > > operations $id="pingd-operations" \
>>> > > op monitor interval="10s" timeout="20s" \
>>> > > op monitor interval="90s" timeout="25s" start \
>>> > > op monitor interval="100s" timeout="25s" stop
>>> > > clone connected pingd \
>>> > >
>>> > > meta globally-unique="false" target-role="started"
>>> > > location cli-prefer-crhweb crhweb \
>>> > >
>>> > > rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1
>>> > > location crhweb_on_connected_node crhweb \
>>> > > rule $id="crhweb_on_connected_node-rule" -inf: not_defined
>>> > > pingval or pingval lte 0
>>> > >
>>> > > location prefer-crhnode1 crhweb 50: crhnode1
>>> > > colocation crhweb-with-failoverip inf: crhweb failoverip
>>> > > order crhweb-after-failoverip inf: pingd failoverip crhweb
>>> > >
>>> > > property $id="cib-bootstrap-options" \
>>> > > dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3"
>>> \
>>> > > cluster-infrastructure="Heartbeat" \
>>> > > stonith-enabled="false" \
>>> > > no-quorum-policy="ignore"
>>> > >
>>> > > On 1 February 2011 07:21, Nikita Michalko
>>> wrote:
>>> > >> Hi Paul,
>>> > >>
>>> > >> see below!
>>> > >>
>>> > >> Am Montag, 31. Januar 2011 19:55 schrieb paul harford:
>>> > >> > HI guys
>>> > >> > i'm having some issues with a ping directive, my current config is
>>> > >> > below and basically i want the web resource to failover to the
>>> second
>>> > >> > node if
>>> > >>
>>> > >> the
>>> > >>
>>> > >> > ping can no longer contact the default gateway
>>> > >> >
>>> > >> > so here goes
>>> > >> >
>>> > >> > crm configure primitive ping ocf:pacemaker:ping params dampen=5s
>>> > >> > host_list=(default GateWay) multplier=1000 name=pingval operations
>>> > >> > $id=ping-operations op moinitor interval=10s timeout=15s
>>> > >>
>>> > >>  - this is surely wrong: "moinitor" ?
>>> > >>  - no such primitive (ping) below ...
>>> > >>
>>> > >> HTH
>>> > >>
>>> > >> Nikita Michalko
>>> > >>
>>> > >> > and
>>> > >> >
>>> > >> > crm configure clone connected ping meta globally-unique=false
>>> > >> > target-role=started
>>> > >> >
>>> > >> > and
>>> > >> >
>>> > >> > location web_on_connected_node cweb rule
>>> > >> > $id=web_on_connected_node-rule -inf: not_defined pingval or
>>> pingval
>>> > >> > lte 0
>>> > >> >
>>> > >> >
>>> > >> > Does anyone see any isssues's whith the above confiuguration ? i
>>> want
>>> > >> > to check first as the last time i tried it wouldn't work and my
>>> > >> > resources would not failover or start
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> > node $id="271808bb-ed74-4eaa-8c94-bf32a00074dd" crhnode1 \
>>> > >> > attributes standby="off"
>>> > >> > node $id="59440607-2a5c-450e-84fa-94bf69742671" crhnode2 \
>>> > >> > attributes standby="off"
>>> > >> > primitive cweb ocf:heartbeat:apache \
>>> > >> > params configfile="/etc/httpd/conf/httpd.conf" \
>>> > >> > op monitor interval="60s" \
>>> > >> > meta target-role="Started"
>>> > >> > primitive failoverip ocf:heartbeat:IPaddr \
>>> > >> > params ip="10.100.1.100" cidr_netmask="255.255.0.0" \
>>> > >> > op monitor interval="30s" \
>>> > >> > meta target-role="Started"
>>> > >> > location cli-prefer-cweb cweb \
>>> > >> > rule $id="cli-prefer-rule-crhweb" inf: #uname eq crhnode1
>>> > >> > location prefer-crhnode1 crhweb 50: crhnode1
>>> > >> > colocation cweb-with-failoverip inf: cweb failoverip
>>> > >> > order crhweb-after-failoverip inf: failoverip cweb
>>> > >> > property $id="cib-bootstrap-options" \
>>> > >> >
>>> dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
>>> > >> > cluster-infrastructure="Heartbeat" \
>>> > >> > stonith-enabled="false" \
>>> > >> > no-quorum-policy="ignore"
>>> > >> > rsc_defaults $id="rsc-options" \
>>> > >> > resource-stickiness="100"
>>> > >>
>>> > >> ___
>>> > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> > >>
>>> > >> Project Home: http://www.clusterlabs.org
>>> > >> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> > >> Bugs:
>>> > >>
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemake
>>> > >>r
>>>
>>>
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs:
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>
>>
>>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>


-- 
Dan Frincu
CCNA, RHCE
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Fwd: Upgrade from openais-0.80 failed

2011-01-26 Thread Dan Frincu
-- Forwarded message --
From: Dan Frincu 
Date: Wed, Jan 26, 2011 at 1:10 PM
Subject: Re: [Pacemaker] Upgrade from openais-0.80 failed
To: "Raoul Bhatia [IPAX]" 


Hi,

On Wed, Jan 26, 2011 at 12:21 PM, Dan Frincu  wrote:

> Hi,
>
> On Wed, Jan 26, 2011 at 11:37 AM, Raoul Bhatia [IPAX] wrote:
>
>> On 01/26/2011 10:06 AM, Dan Frincu wrote:
>> > Hi,
>> >
>> > I've got a pair of servers running on RHEL5 x86_64 with openais-0.80
>> > (older install) which I want to upgrade to corosync-1.3.0 +
>> > pacemaker-1.0.10. Downtime is not an issue and corosync 1.3.0 is needed
>> > for UDPU, so I built it from the corosync.org <http://corosync.org>
>> > website and openais 1.1.4 from openais.org <http://openais.org>
>> website.
>>
>> hi,
>>
>> you do not need both corosync and openais as corosync supersedes
>> openais: http://corosync.org/doku.php?id=faq:why
>>
>> on my debian squeeze based system, i see corosync only:
>>
>> > # dpkg -l|grep -i coro
>> > ii  corosync   1.2.1-4
>> > ii  libcorosync4   1.2.1-4
>> > root@wdb01 ~ #
>>
>> vs.
>> > root@wdb01 ~ # dpkg -l|grep -i ais
>> > root@wdb01 ~ #
>>
>
Removed everything after removing the RPM's, just to be extra paranoid about
leftovers (rpm -qpl *.rpm >> file && for i in `cat file `; do [[ -e "$i" ]]
&& echo "$i" >> newfile ; done && for i in `cat newfile` ; do rm -rf $i ;
done)

Installed RPMs (without openais)

Same output

http://pastebin.com/3iPHSXua

Regards,
Dan


>
>>
>> > Logs: http://pastebin.com/i0maZM4p
>>
>> your logfiles tell that pacemaker 1.0.9 is started (line 55):
>> > Jan 25 11:19:39 corosync [SERV  ] Service engine loaded: Pacemaker
>> Cluster Manager 1.0.9
>>
>
> To be honest I remember reading an email (can't find it now) on this
> mailing list about some Pacemaker still outputting 1.0.9 to logs even if
> it's version is 1.0.10, that's why I didn't look in depth at this.
>
>
>>
>> on the other hand, line 59 says:
>> > Jan 25 11:19:39 cluster1 crmd: [9722]: info: main: CRM Hg Version:
>> da7075976b5ff0bee71074385f8fd02f296ec8a3
>>
>> which should be 1.0.10 (/me is puzzled)
>>
>>
>> can you purge all related packages once more and verify, that
>> all binaries are gone?
>>
>
> I've done that already, and still feel there's some leftover. Will remove
> them again and start over.
>
> Regards,
> Dan
>
>
>>
>> cheers,
>> raoul
>> --
>> 
>> DI (FH) Raoul Bhatia M.Sc.  email.  r.bha...@ipax.at
>> Technischer Leiter
>>
>> IPAX - Aloy Bhatia Hava OG  web.  http://www.ipax.at
>> Barawitzkagasse 10/2/2/11   email.off...@ipax.at
>> 1190 Wien   tel.   +43 1 3670030
>> FN 277995t HG Wien  fax.+43 1 3670030 15
>> 
>>
>
>
>
> --
> Dan Frîncu
> CCNA, RHCE
>
>


-- 
Dan Frîncu
CCNA, RHCE




-- 
Dan Frîncu
CCNA, RHCE
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] SNMP subagent for pacemaker/corosync

2011-01-26 Thread Dan Frincu
Hi,

On Wed, Jan 26, 2011 at 1:41 PM, Michael Schwartzkopff  wrote:

> Hi,
>
> any volunteers to have look on the old heartbeat SNMP subagent? I'd like to
> see that program working in a pacemaker/corosync cluster. Depending on the
> conditions I am willing to offer real money for that task if that could not
> be
> done in a community effort.
>

You should take a look at this
https://lists.linux-foundation.org/pipermail/openais/2011-January/015617.html


Regards,
Dan


>
> Greetings,
>
> --
> Dr. Michael Schwartzkopff
> Guardinistr. 63
> 81375 München
>
> Tel: (0163) 172 50 98
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>


-- 
Dan Frîncu
CCNA, RHCE
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Upgrade from openais-0.80 failed

2011-01-26 Thread Dan Frincu
Hi,

I've got a pair of servers running on RHEL5 x86_64 with openais-0.80 (older
install) which I want to upgrade to corosync-1.3.0 + pacemaker-1.0.10.
Downtime is not an issue and corosync 1.3.0 is needed for UDPU, so I built
it from the corosync.org website and openais 1.1.4 from openais.org website.


With pacemaker, we won't be using the heartbeat stack, so I built the
pacemaker package from the clusterlabs.org src.rpm without heartbeat
support. To be more precise I used

rpmbuild --without heartbeat --with ais --with snmp --with esmtp -ba
pacemaker-epel.spec

Now I've tested the rpm list below on a pair of XEN VM's, it works just
fine.

cluster-glue-1.0.6-1.6.el5.x86_64.rpm
cluster-glue-libs-1.0.6-1.6.el5.x86_64.rpm
corosync-1.3.0-1.x86_64.rpm
corosynclib-1.3.0-1.x86_64.rpm
libesmtp-1.0.4-5.el5.x86_64.rpm
libibverbs-1.1.2-1.el5.x86_64.rpm
librdmacm-1.0.8-1.el5.x86_64.rpm
libtool-ltdl-1.5.22-6.1.x86_64.rpm
openais-1.1.4-2.x86_64.rpm
openaislib-1.1.4-2.x86_64.rpm
openhpi-2.10.2-1.el5.x86_64.rpm
openib-1.3.2-0.20080728.0355.3.el5.noarch.rpm
pacemaker-1.0.10-1.4.x86_64.rpm
pacemaker-libs-1.0.10-1.4.x86_64.rpm
perl-TimeDate-1.16-5.el5.noarch.rpm
resource-agents-1.0.3-2.6.el5.x86_64.rpm

However when performing the upgrade on the servers running openais-0.80,
first I removed the heartbeat, heartbeat-libs and PyXML rpms (conflicting
dependencies issue) then rpm -Uvh on the rpm list above. Installation went
fine, removed existing cib.xml and signatures, fresh start. Then I
configured corosync, then started it on both servers, and nothing. At first
I got an error related to pacemaker mgmt, which was an old package installed
with the old rpms. Removed it, tried again. Nothing. Removed all cluster
related rpms old and new + deps, except for DRBD, then installed the list
above, then again, nothing. What nothing means:
- corosync starts, never elects DC, never sees the other node or itself for
that matter.
- can stop corosync via the init script, it goes into an endless phase where
it just prints dots to the screen, have to kill the process to make it stop.


Troubleshooting done so far:
- tested network sockets (nc from side to side), firewall rules (iptables
down), communication is ok
- searched for the original RPM's list, removed all remaining RPMs, ran
ldconfig, removed new RPM's, installed new RPM's

My guess is that there are some leftovers from the old openais-0.80
installation, which mess with the current installation, seeing as how the
same set of RPMs on a pair of XEN VM's with the same OS work fine, however I
cannot put my finger on the culprit for the real servers' issue.

Logs: http://pastebin.com/i0maZM4p

Ideas, suggestions?

TIA.

Regards,
Dan

-- 
Dan Frîncu
CCNA, RHCE
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Making utilizations dynamic

2011-01-25 Thread Dan Frincu
Hi,

On Tue, Jan 25, 2011 at 2:41 PM, Robert van Leeuwen  wrote:

> > > The only thing to do that remains would be a daemon that switches off
> > > unused machines to save energy. But this could be done using STONITH
> > > agents.
> > >
> > > Basically this would be an option to make cloud computing really green!
> > >
> > > Please mail me your comments about this idea. Thanks.
> > >
> > > Cheers,
> >
> > No reply, no comments? Nothing at all?
>
>
I for one think the that this type of green computing you're trying to
envision make a lot of sense, I know that RedHat is one of the powers behind
this project and they already have something like this vision, but more
integrated and with all the nice GUI's already implemented in RHEV.

Yes, I second the the fact that it adds complexity, but then you can just
split the large cluster into several smaller clusters (if it's really the
case for such a thing). Anyway, there are solutions to this kind of an
issue.

I was thinking to another extent of this idea, taking the example of openvz
based VM's which can have limits set on the CPU usage, either as a
percentage or as cpuunits. The utilization feature could (theoretically)
allow moving multiple VM's together based on their CPU usage. In this case
let's say you've got a quad-core CPU, that would mean 400%, and allocating
100% per VM (or each VM gets a CPU core) you can put a maximum of 4 VM's per
node (or whatever other scenario one might think of, this is just an
example).

Actually when I first read the utilization feature I immediately thought of
this scenario.

Regards,
Dan


>
> Hello Michael,
>
> Although in theory the idea is sound I use the cluster suite primarily for
> customers demanding high availability.
> Adding complexity, turning server's off & on and moving resources around
> probably won't be beneficial to the uptime off the resources & hosts.
> Turning nodes off also effects the number of quorum votes.
> If you have a major disaster when a part of the cluster is power-ed down
> you might create an scenario where the cluster can not recover itself but it
> would have recovered if all nodes were still running.
>
> I could see these features being very useful in non-ha environments like
> testing/development but most off MY customers cannot use this feature or
> would rather pay the power bill...
>
> Just my 2 cents,
>
> Robert van Leeuwen
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>



-- 
Dan Frîncu
CCNA, RHCE
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Problem with rrp_mode

2011-01-19 Thread Dan Frincu

Hi,

Andrew Beekhof wrote:

I dont think rrp is well tested by upstream.
You might want to ask on the corosync ML to be sure.

On Wed, Jan 19, 2011 at 3:43 PM, Michael Schwartzkopff
 wrote:
  

Hi,

I have two network cards and configured corosync-1.2.7 with
rrp_mode: active

at first corosync-cfg -s tells me
Printing ring status.
Local node ID 1210452490
RING ID 0
   id  = 10.10.38.72
   status  = ring 0 active with no faults
RING ID 1
   id  = 10.10.40.115
   status  = ring 1 active with no faults

after a very short time I see the following:

RING ID 1
   id  = 10.10.40.115
   status  = Incrementing problem counter for seqid 1352 iface
10.10.40.115 to [3 of 10]

and finally:
RING ID 1
   id  = 10.10.40.115
   status  = Marking seqid 1390 ringid 1 interface 10.10.40.115 FAULTY -
adminisrtative intervention required.

Anybody being successful at all using rrp_mode with corosync?

I use rrp_mode: active with two network bonds (4 network cards) and what 
I can tell you is that when I perform tests by shutting down the switch 
ports for one of the bonds, it starts to increment the counter then 
marks the ring as faulty, just as you mentioned, which is normal because 
I just shut the ports.


If you're encountering this without having network connectivity 
problems, then yes, it's an issue, otherwise, when the connection per 
ring is lost, that's the normal output.


Also when the network connectivity is restored the ring doesn't restore 
it's previous condition automatically, it's a feature to be implemented 
in the corosync 2.y.z branch (a.k.a. Weaver's Needle). For now after 
restoring the network connectivity to get the ring back you either do a 
corosync-cfgtool -r or have some script monitor and do it for you.


HTH,
Dan

Greetings.

--
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Some questions about corosync-cfgtool

2011-01-18 Thread Dan Frincu

Hi,

https://lists.linux-foundation.org/pipermail/openais/2011-January/015626.html

xin.li...@cs2c.com.cn wrote:

hi everybody!

I have some questions about corosync-cfgtool
   
1. What should I do when "corosync-cfgtool -s" return "Could not 
initialize corosync configuration API error" ? Restart corosync ?(I 
don't think it's a good idea)


2. How can the process happen automatically when network problems 
is repaired, instead of using "corosync-cfgtool -r" manually?




My testing environment is :
2 PC (on virtualbox-3.2.12)
Double network card and double heart-beat link(eth0 and eth1)
OS: RHEL 5.3 x86
primary rpms: corosync-1.3.0 and pacemaker-1.0.10
corosync.conf:(relevant portion)
   
compatibility: whitetank


totem {
version: 2
secauth: off
threads: 0
rrp_mode: active
interface{
ringnumber:0
bindnetaddr:10.10.10.0
mcastaddr:235.3.4.5
mcastport:9876
}
interface{
ringnumber:1
bindnetaddr:20.20.20.0
mcastaddr:235.3.4.6
mcastport:9877
}   
}




   
When something bad happen on one of the double-heart-beat-link
just like: ifdown eth0, or pull out one of the netting 
twine-


I use:(ifdown eth1 before)
corosync-cfgtool -s
found that:
Printing ring status.
Local node ID 185207306
RING ID 0
id= 10.10.10.11
status= ring 0 active with no faults
RING ID 1
id= 20.20.20.11
status= Marking seqid 14089 ringid 1 interface 20.20.20.11 
FAULTY - adminisrtative intervention required.

then:
ifup eth1
and:
corosync-cfgtool -r

The problem is repaired.

BUT, I want this process happen automatically when network 
problems is repaired, I write this shell, and start it when service 
corosync start:


 ~~~


#!/bin/bash
local_is_down=0
ip_res=`corosync-cfgtool -s|grep id|awk '{print $3}'`
while [ -z "$ip_res" ]
do
sleep 5
ip_res=`corosync-cfgtool -s|grep id|awk '{print $3}'`
done

ip_num=`corosync-cfgtool -s|grep "no faults"|wc -l`

while true
do
sleep 10
res=`corosync-cfgtool -s`
echo "$res" |grep FAULTY &> /dev/null
if [ "$?" -ne 0 ];then
tmp_num=`echo "$res"|grep "no faults"|wc -l`
if [ "$tmp_num" -eq "$ip_num" ];then
local_is_down=0
else
 continue
 fi
else
 mii-tool |grep "no link" &> /dev/null  
if [ $? -eq 0 ];then
   local_is_down=1 #pull out one of 
the netting twine

else
   for IP in $ip_res
   do
   ifconfig|grep "$IP" &> 
/dev/null
   if [ $? -ne 0 ];then

   local_is_down=1#ifdown
fi
done
   fi

  if [ $local_is_down -eq 1 ];then
  corosync-cfgtool -r &> /dev/null
   fi
 fi
done &
 
~~

The shell works well mostly, however, it does not work sometimes 
because command "corosync-cfgtool -s" return

"Could not initialize corosync configuration API error 6"

and the pacemaker process seems also done.


 THANKS!


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] gemated service

2011-01-17 Thread Dan Frincu
Does the init script for gmetad contain the status parameter? Check what 
exit code it returns, your problem could be the init script not being 
LSB compliant.


hth

jiaju liu wrote:


I use command
crm configure primitive cfs_monitor lsb:gmetad meta
resource-stickiness=1 op monitor  timeout=15 interval=15 op
force-reload timeout=15
 
to start a resource gmetad,crm resource show shows

cfs_monitor (lsb:gmetad) Started
and
service gmetad status shows
gmetad is stopped
my pacemaker version is
pacemaker-1.0.9.1-1.el5
pacemaker-libs-1.0.9.1-1.el5
pacemaker-libs-devel-1.0.9.1-1.el5


 



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] pacemaker support lampp

2011-01-04 Thread Dan Frincu

Hi,

jiaju liu wrote:






>Yes offcause.

>Php runs inside apache, so you make rules for mysql and apache.

>Best regards
>Allan Jacobsen

Thank you for your reply, I haven't found lampp resource agent, so
this means I must write a resource agent for lampp? would you
please tell me more about what I should do ? Thanks a lot


No, you need to read what i wrote, lampp is not one piece of
software, so you need to control apache and mysql.

There is a nice howto here:
http://www.clusterlabs.org/wiki/DRBD_MySQL_HowTo
 
HI

I do as the website write, however the resource could not start,
command
primitive drbd_mysql ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="15s"
primitive fs_mysql ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/r0" directory="/service/"
fstype="ext3"
primitive ip_mysql ocf:heartbeat:IPaddr2 \
params ip="10.100.100.200" nic="eth0"
primitive mysqld lsb:mysql
primitive apache lsb:apache2

group mysql fs_mysql ip_mysql mysqld apache
ms ms_drbd_mysql drbd_mysql \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
 
is OK? my package is pacemaker-1.0.5-4.1 openais-0.80.5-15.1

xampp-linux-1.7.2.tar.gz. Have you try it before?  Thanks a lot


 
You do realize that the discussion about openais-0.80 not being 
supported has been brought to your attention in previous threads several 
months ago with the suggestion to upgrade to a newer version => 
http://clusterlabs.org/rpm/


Also the LSB init scripts (at least for MySQL) aren't working properly. 
You should use the ocf:heartbeat:mysql and ocf:heartbeat:apache scripts 
for controlling the resources. For apache, since xampp usually puts it's 
files in /opt, you might have to change the paths in the apache 
primitive to match the new location, if you're using the xampp 
installation guide.


HTH,
Dan



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] apache resource agent own listen port information

2011-01-03 Thread Dan Frincu

crm ra info ocf:heartbeat:apache | grep port
export OCF_ROOT=/usr/lib/ocf
cd /usr/lib/ocf/resource.d/heartbeat
./apache meta-data | grep port

hth

jiaju liu wrote:


apache default listen port is 80, which is used by IIC so I have
to change apache listen port to 800. when I use service httpd
start it is ok however when I use pacemaker to start it it failed
it shows :apache_start_0 (node=mds1, call=57, rc=1,
status=complete): unknown error
the apache agent I use is  ocf:heartbeat:apache. should change
some listen port information in RA?


 



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] lampp support

2010-12-06 Thread Dan Frincu

Hi,

By lampp you mean Linux, Apache, MySQL PHP/Perl/Python?

And what's stopping you from using the apache and mysql RA's? You want 
some kind of all-in-wonder? Try a resource group.


HTH

jiaju liu wrote:


Hi all
I want to add lampp into my HA cluster, unfortunately, I have not
find lampp RA, is this mean I should write RA by myself or my
resource-agent version is too old?

my packages version are

 


cluster-glue-libs-devel-1.0.5-1.el5
cluster-glue-1.0.5-1.el5
cluster-glue-libs-1.0.5-1.el5

 


heartbeat-libs-3.0.3-2.el5
heartbeat-devel-3.0.3-2.el5
heartbeat-3.0.3-2.el5 


openais-1.1.0-1.el5
openaislib-1.1.0-1.el5
openaislib-devel-1.1.0-1.el5

 


corosynclib-devel-1.2.2-1.1.el5
corosynclib-1.2.2-1.1.el5
corosync-1.2.2-1.1.el5

 


pacemaker-libs-1.0.8-6.1.el5
pacemaker-1.0.8-6.1.el5
pacemaker-libs-devel-1.0.8-6.1.el5


 



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] UDPU transport patch added, when will the RPMs be available

2010-11-22 Thread Dan Frincu

Hi Steven,

Steven Dake wrote:

On 11/19/2010 11:42 AM, Andrew Beekhof wrote:
  

On Fri, Nov 19, 2010 at 11:38 AM, Dan Frincu  wrote:


Hi,

The subject is pretty self-explanatory but I'll ask anyway, the patch for
UDPU has been released, this adds the ability to set unicast peer addresses
of nodes in a cluster, in network environments where multicast is not an
option. When will it be available as an RPM?
  

When upstream does a new release.




Dan,

The flatiron branch (containing the udpu patches) is going through
testing for 1.3.0.  We find currently that single CPU virtual machine
systems seem to have problems with these patches which we will sort out
before release.

Regards
-steve


  
I've taken the (tip I think it is called) of corosync.git and compiled 
the RPM's on RH5U3 64-bit (I got the code the day it was first released, 
haven't had a chance to post yet).


# git show
commit 565b32c2621c08f82cab57420217060d100d4953
Author: Fabio M. Di Nitto 
Date:   Fri Nov 19 09:21:47 2010 +0100

There were some issues when compiling, deps mostly, some in the spec 
related to version which was UNKNOWN, I did a sed, placed 1.2.9 as a 
number instead of UNKNOWN and it compiled OK. I've installed it on two 
Xen VM's I use for testing and found some issues so the question is: 
where can I send feedback (and what kind of feedback is required) about 
development code? I'm not saying that you guys haven't run into these 
errors, maybe you did and they were fixed and maybe some are specific to 
my setup and haven't been found so, if I can provide some feedback on 
development code, I'd be more than happy to, if that's OK.


Also, I've read about the cluster test suite, but I'm not actually sure 
how it works, could somebody provide some details as to how I can use 
the cluster test suite on a cluster to check for issues and then how can 
I report if there are any issues found (again, what kind of feedback is 
required).


Regards,
Dan

p.s.: ignore my other email, I didn't see the reply on this one.


If I'm barking up the wrong tree, please direct me to the proper channel to
direct this request, I'm really looking forward to testing the UDPU.

Regards,

Dan

--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

  

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] UDPU transport patch added, when will the RPMs be available

2010-11-22 Thread Dan Frincu

Hi,

Andrew Beekhof wrote:

On Fri, Nov 19, 2010 at 11:38 AM, Dan Frincu  wrote:
  

Hi,

The subject is pretty self-explanatory but I'll ask anyway, the patch for
UDPU has been released, this adds the ability to set unicast peer addresses
of nodes in a cluster, in network environments where multicast is not an
option. When will it be available as an RPM?



When upstream does a new release.

  
So who/where do I ask when that will take place? Is there a road map 
detailing this?


TIA

Regards,
Dan

If I'm barking up the wrong tree, please direct me to the proper channel to
direct this request, I'm really looking forward to testing the UDPU.

Regards,

Dan

--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] UDPU transport patch added, when will the RPMs be available

2010-11-19 Thread Dan Frincu

Hi,

The subject is pretty self-explanatory but I'll ask anyway, the patch 
for UDPU has been released, this adds the ability to set unicast peer 
addresses of nodes in a cluster, in network environments where multicast 
is not an option. When will it be available as an RPM?


If I'm barking up the wrong tree, please direct me to the proper channel 
to direct this request, I'm really looking forward to testing the UDPU.


Regards,

Dan

--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] crm resource restart fails to restart the service

2010-11-17 Thread Dan Frincu
k: 
Invoking the PE: query=17432, ref=pe_calc-dc-1290005471-17355, seq=116, 
quorate=1
Nov 17 07:51:11 cluster1 pengine: [25687]: notice: unpack_config: On 
loss of CCM Quorum: Ignore
Nov 17 07:51:11 cluster1 pengine: [25687]: info: unpack_config: Node 
scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Nov 17 07:51:11 cluster1 pengine: [25687]: info: 
determine_online_status: Node cluster1 is online
Nov 17 07:51:11 cluster1 pengine: [25687]: info: 
determine_online_status: Node cluster2 is online
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Nov 17 07:51:11 cluster1 crmd: [25688]: info: unpack_graph: Unpacked 
transition 17188: 0 actions in 0 synapses
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_te_invoke: Processing 
graph 17188 (ref=pe_calc-dc-1290005471-17355) derived from 
/var/lib/pengine/pe-input-27465.bz2
Nov 17 07:51:11 cluster1 crmd: [25688]: info: run_graph: 

Nov 17 07:51:11 cluster1 crmd: [25688]: notice: run_graph: Transition 
17188 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pengine/pe-input-27465.bz2): Complete
Nov 17 07:51:11 cluster1 crmd: [25688]: info: te_graph_trigger: 
Transition 17188 is now complete
Nov 17 07:51:11 cluster1 crmd: [25688]: info: notify_crmd: Transition 
17188 status: done - 
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Nov 17 07:51:11 cluster1 crmd: [25688]: info: do_state_transition: 
Starting PEngine Recheck Timer
Nov 17 07:51:11 cluster1 pengine: [25687]: info: process_pe_message: 
Transition 17188: PEngine Input stored in: 
/var/lib/pengine/pe-input-27465.bz2


Regards,

Dan

Vadym

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] crm resource restart fails to restart the service

2010-11-17 Thread Dan Frincu

Hi,

r...@cluster1:/# pgrep mysql
961
1127
r...@cluster1:/# crm resource restart mysqld
r...@cluster1:/# pgrep -fl mysql
961
1127

The restart command doesn't actually restart the process, I have tried 
this with another custom built OCF compliant RA and have the same issue.


# rpm -qa '(pacemaker|corosync|resource-agents)'
pacemaker-1.0.9.1-1.el5
resource-agents-1.0.3-2.el5
corosync-1.2.7-1.1.el5

# crm configure show mysqld
primitive mysqld ocf:heartbeat:mysql \
   params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" 
enable_creation="0" datadir="/mysql/database" user="root" 
test_user="monitor" test_passwd="monitor" test_table="cluster.monitor" \

   op monitor interval="10s" timeout="5s" \
   op start interval="0s" \
   op stop interval="0s" \
   meta target-role="Started"

Ideas?

Regards,
Dan

--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Problems setting up clone resource syslog-ng - I need help

2010-11-17 Thread Dan Frincu
op monitor interval="10s" role="Master" timeout="240s"
primitive drbd_disk1 ocf:linbit:drbd \
params drbd_resource="pnp4nagios" \
op monitor interval="20s" role="Slave" timeout="240s" \
op monitor interval="10s" role="Master" timeout="240s"
primitive drbd_disk2 ocf:linbit:drbd \
params drbd_resource="services" \
op monitor interval="20s" role="Slave" timeout="240s" \
op monitor interval="10s" role="Master" timeout="240s"
primitive fs_drbd ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/usr/local/nagios"
fstype="ext3" \
op monitor interval="15s" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="360s"
primitive fs_drbd1 ocf:heartbeat:Filesystem \
params device="/dev/drbd1" directory="/usr/local/pnp4nagios"
fstype="ext3" \
op monitor interval="15s" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="360s"
primitive fs_drbd2 ocf:heartbeat:Filesystem \
params device="/dev/drbd2" directory="/services/etc"
fstype="ext3" \
op monitor interval="15s" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="360s" \
meta target-role="Started"
primitive ip1 ocf:heartbeat:IPaddr2 \
params ip="192.168.1.112" nic="eth5" cidr_netmask="24" \
op monitor interval="10s" timeout="20s" \
meta target-role="Started"
primitive mailto ocf:heartbeat:MailTo \
params email="m...@domain.com" \
op monitor interval="10" timeout="10" depth="0" \
meta target-role="Started"
primitive nagios ocf:naprax:nagios \
params configfile="/usr/local/nagios/etc/nagios.cfg"
nagios="/usr/local/nagios/bin/nagios" \
op monitor interval="15s" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="240s" \
meta target-role="Started"
primitive pingd ocf:pacemaker:pingd \
params host_list="host1 host2 host3 host4" multiplier="100"
dampen="5s" \
op monitor interval="15s" timeout="5s"
primitive syslog-ng ocf:heartbeat:syslog-ng \
params configfile="/etc/syslog-ng/syslog-ng.conf"
syslog_ng_binary="/usr/sbin/syslog-ng" \
op monitor interval="60s" timeout="60s" depth="0"
group nagios-group fs_drbd fs_drbd1 fs_drbd2 ip1 apache2 nagios mailto \
meta target-role="Started"
ms ms_drbd drbd_disk \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" target-role="Started"
ms ms_drbd1 drbd_disk1 \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
ms ms_drbd2 drbd_disk2 \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
clone pingdclone pingd \
meta globally-unique="false" target-role="Started"
clone syslog-ng-clone syslog-ng \
meta globally-unique="false" target-role="Started"
location cli-prefer-ip1 nagios-group \
rule $id="cli-prefer-rule-ip1" inf: #uname eq lxds07 and 
#uname eq lxds05 location nagios-group_on_connected_node nagios-group 
\
rule $id="pingd-rule" pingd: defined pingd colocation 
drbd_on_disks inf: ms_drbd ms_drbd1 ms_drbd2 nagios-group order 
mount_after_drbd inf: ms_drbd:promote nagios-group:start order 
mount_after_drbd1 inf: ms_drbd1:promote nagios-group:start order 
mount_after_drbd2 inf: ms_drbd2:promote nagios-group:start property 
$id="cib-bootstrap-options" \

dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"

##
##


Regards

Stefan



  

___
Pacemaker mailing list: Pac

Re: [Pacemaker] using xml for rules

2010-11-15 Thread Dan Frincu

Pavlos Parissis wrote:

On 15 November 2010 08:14, Andrew Beekhof  wrote:
  

On Thu, Nov 11, 2010 at 2:10 PM, Pavlos Parissis
 wrote:


I removed "score=2" from

  

Have a look at the schema file:
  
http://hg.clusterlabs.org/pacemaker/stable-1.0/raw-file/tip/xml/pacemaker.rng.in
  http://hg.clusterlabs.org/pacemaker/stable-1.0/raw-file/tip/xml/rule.rng.in

For starters: s/boolean_op/boolean-op/



that was it! thanks a lot, now  I have to see if it is works as expected.


BTW, any suggestion for a good xml editor which I can use to validate
configuration?
  

The crm shell? :)

Thanks again,
Pavlos

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup

2010-11-14 Thread Dan Frincu

Ruzsinszky Attila wrote:

That's what I said - I didn't see it either.
but if you you check the current RA:


What do you think about this:
http://www.lathiat.net/files/MySQL%20-%20DRBD%20&%20Pacemaker.pdf

I can't see if this is a real M-M or M-S setup.
  

It's a Master-Slave setup.

This is the PDF I mentioned.

Offtopic: The setup in that PDF is pretty basic, I think the person that 
wrote the document and myself share a lot of common views related to the 
configuration, however I would advise using "drbdadm -- --clear-bitmap 
new-current-uuid mysql" instead of "drbdadm -- --overwrite-data-of-peer 
primary mysql" as the latter will start a synchronization process, which 
is pointless in this case as the DRBD block device is empty so it will 
be synchronizing empty space while the former synchronizes both servers' 
partitions "instantly" (starting from version 8.3). Also, I'm impressed 
to see naming like "ms ms_drbd_mysql drbd_mysql", "colocation 
mysql_on_drbd inf: mysql ms_drbd_mysql:Master", "order mysql_after_drbd 
inf: ms_drbd_mysql:promote mysql:start" in official documents, as this 
is the naming I use as well when defining primitives, collocation and 
ordering constraints. I know it's not much or that it really doesn't 
matter how you name the resources and constraints, as long as they are 
syntactically correct but I just couldn't get used to the resource 
naming used in the DRBD documentation, sorry guys, you do an awesome 
work, but 'primitive p_drbd_r0 ocf:linbit:drbd params 
drbd_resource="r0"', " colocation c_drbd_r0-U_on_drbd_r0 inf: 
ms_drbd_r0-U ms_drbd_r0:Master" and other such naming confused the life 
out of me :)


Sorry for the Offtopic.

Regards,

Dan

TIA,
Ruzsi

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup

2010-11-14 Thread Dan Frincu



So I guess there are 2 ways for a MS setup with MySQL.


OK.
And where is a cookbook for setting up M-S config?
  
Have you even read that PDF, it documents just that, a MS setup with 
MySQL ...

Why not M-M?
  

You have an obsession, you should see a doctor about that.

--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup

2010-11-14 Thread Dan Frincu

Hi,

I am pretty sure Linbit announced mysql RA with replication capabilities. 
Haven't seen documentation though.

# crm ra meta mysql|grep ^replica
replication_user (string): MySQL replication user
replication_passwd (string): MySQL replication user password
replication_port (string, [3306]): MySQL replication port
  
You're probably using a newer version of resource-agents, I have 
resource-agents-1.0.3-2.el5 and:


# crm ra meta mysql|grep ^replica
# echo $?
# 1

I've found the patches for the MySQL RA though

http://hydra.azilian.net/gitweb/?p=linux-ha/.git;a=summary

And the original thread

http://www.mail-archive.com/linux...@lists.linux-ha.org/msg14992.html

The patches apply for a Master-Slave Replication setup, haven't tested 
them though.

So now almost the only one possibilities is DRBD+MySQL?
  

So I guess there are 2 ways for a MS setup with MySQL.

Regards,

Dan

--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup

2010-11-12 Thread Dan Frincu

Hi,

Ruzsinszky Attila wrote:

Hi,

  

And now (officially) RHCS can also use Pacemaker
http://theclusterguy.clusterlabs.org/post/1551292286


Nice.

  

Yeah, like I said, Master-Master and Pacemaker without a proper resource
agent will cause issues.


Yes.

  

big problems. Now let me explain this, a 2-node Multi-Master MySQL setup
means setting up every node as both Master and Slave, node 1's Master
replicates asynchronously to node 2's Slave and node 2's Master replicates
asynchronously to node 1's Slave. The replication channels between the two
are not redundant, nor do they recover from failure automatically and you
have to manually set the auto-increment-increment and auto-increment-offset
so that you don't have primary key collisions.


Clear.

  

each server. Looking at how DRBD handles these kinds of things is one way to
go about it, but ... it's a huge task and there are a lot of things that can
go terribly wrong.


:-(

  

So again, for the third time, the problem is not the Multi-Master setup, nor
it is Pacemaker, it's just a very specific use case for which a resource
agent wasn't written.


OK.
So now almost the only one possibilities is DRBD+MySQL?
  
Afaik, yes, I'm hoping someone will step in and say otherwise, but to 
the best of my knowledge, the only implementation between MySQL and 
Pacemaker is represented by the mysql and mysql-proxy resource agents 
http://www.linux-ha.org/wiki/Resource_Agents


The mysql RA controls a single MySQL instance and the rest of the HA 
setup is done via DRBD Master-Slave resources.


Regards,

Dan

TIA,
Ruzsi

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup

2010-11-12 Thread Dan Frincu

Hi,

Ruzsinszky Attila wrote:

Hi,

  

A MySQL Multi-Master architecture for a 2 node setup brings a lot of
configuration and administration overhead and has no conflict detection or
resolution. Integrating such a setup with Pacemaker only adds to the


Yes, I found it.
The real story: I want to learn clustering with a 2 node failover cluster.
I configured the cluster by DMC (DRBD Management Console).
I used the GUI configuring a MySQL service. It was almost unsuccessfull
which wasn't a surprise for me. After that I started to read some HowTo,
WEB page, etc. for help. I found someone from #mysql-nbd channel who
helped me and adviced me using M-M MySQL config but he doesn't know
almost anything about Pacemaker (He uses RH cluster).
  
And now (officially) RHCS can also use Pacemaker 
http://theclusterguy.clusterlabs.org/post/1551292286

After we did the working M-M config I started pacemaker and I could see
MySQL is working. I could connect to the commonIP and I could create a
test DB. Everything seemed all right until I put standby the master node
(from pacemaker point of view). In that moment mysqld started to "blinking"
between working and not working state because pacemaker always restarted
the process.

In the messages file I clould see some lines about missing privs. (RELOAD
and SUPER).

So I'm here now.
  
Yeah, like I said, Master-Master and Pacemaker without a proper resource 
agent will cause issues.
  

server. Even the LSB script doesn't handle a Multi-Master setup. You'd have
to write a custom resource agent, and it would probably fit your setup and
your setup alone, meaning it couldn't be widely used for other setups, I
know I had to make some modifications to the mysql resource agent and those
changes were specific to my setup.


No, I don't want to write scripts. I'm not a programmer. I just want
to try out a
new tech for MySQL clustering except MySQL+DRBD.It is clear for me
theoretically. The files of mysqld reside on the common dir. which was switched
by DRBD. Is that right?
  

Yes.
  

MySQL Cluster is a choice, it could be integrated with Pacemaker, although I


Now I don't want MySQL Cluster. I think it is a bigger task for me.

  

Anyways, this is just to get a feel for what's involved in the process, and
how Pacemaker would fit the picture, at least from my point of view.


OK

  

I would recommend all questions related to MySQL Cluster, Replication,
Multi-Master be directed to the appropriate mailing lists though, and if you


As I mentioned I've got an M-M config from #mysql-nbd.
The recent problem is MySQL (M-M) + Pacemaker.
  
Back to square one, don't pass go, don't collect $200. No resource 
agent, big problems. Now let me explain this, a 2-node Multi-Master 
MySQL setup means setting up every node as both Master and Slave, node 
1's Master replicates asynchronously to node 2's Slave and node 2's 
Master replicates asynchronously to node 1's Slave. The replication 
channels between the two are not redundant, nor do they recover from 
failure automatically and you have to manually set the 
auto-increment-increment and auto-increment-offset so that you don't 
have primary key collisions.


Now imagine a resource agent and what it should do to keep the resources 
up. First you need to check periodically (monitor) the replication 
channels, if they fail, you must determine which node has the most 
recent information and make sure that first it's information is sent to 
the other node via the Slave replication channel, then activate the 
reverse Master-Slave channel, otherwise you'd be in a MySQL 
'split-brain' situation, where each node has information written to it 
and the database now contains different views on each server. Looking at 
how DRBD handles these kinds of things is one way to go about it, but 
... it's a huge task and there are a lot of things that can go terribly 
wrong.


So again, for the third time, the problem is not the Multi-Master setup, 
nor it is Pacemaker, it's just a very specific use case for which a 
resource agent wasn't written.


Regards,

Dan
  

want to write a resource agent for a Multi-Master setup, by all means, do
share :)


No, I don't want. I'm a beginner both in clustering and MySQL.

  

Hope this helps.


Yes, of course.

BTW.
If I want to solve the above problem can you help me? 
Of course with my

strict error messages, config files, etc. I "feel" my M-M config is not
rock stable (I was able to brake the IO or SQL "channel" between the
two mysqld processes) so I don't know whether I want this type of setup.

TIA,
Ruzsi

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http:/

Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup

2010-11-12 Thread Dan Frincu

Hi,

Ruzsinszky Attila wrote:

You're not making sense, first you say MySQL Master-Master, then you
mention master mysqld on clusterB and slave mysqld on clusterA. So,
which one is it:


Yes, it is true. If I stop openais and I start mysql without openais the config
is M-M (or Multi-Master).

When pacemaker starts mysql processes I can see master and slave mysqld
text from crm_mon.

  

- MySQL Master-Master (or Multi-Master) which can be achieved via MySQL
Replication
- MySQL Master-Slave, which can be achieved via MySQL Replication as well


I'd like to implement the above. I don't know which one is right for me.
Because of M-M MySQL config I think the 1st one is my choice.
  
A MySQL Multi-Master architecture for a 2 node setup brings a lot of 
configuration and administration overhead and has no conflict detection 
or resolution. Integrating such a setup with Pacemaker only adds to the 
overhead, as the current resource agents only handle a standalone MySQL 
server. Even the LSB script doesn't handle a Multi-Master setup. You'd 
have to write a custom resource agent, and it would probably fit your 
setup and your setup alone, meaning it couldn't be widely used for other 
setups, I know I had to make some modifications to the mysql resource 
agent and those changes were specific to my setup.


MySQL Cluster is a choice, it could be integrated with Pacemaker, 
although I don't actually see the benefits in this case, meaning MySQL 
Cluster would be the database backend, on it's own, doing it job, and to 
that backend you could connect from multiple frontends, put a load 
balancer (or two) before the frontends and you've got quite the setup, 
and the frontends and load balancer could be controlled by Pacemaker. 
But MySQL Cluster has it's downsides as well, it needs a minimum of 4 
nodes (it could probably work with less but that's the general 
recommendation), 2 data node, one SQL node and one management node. The 
SQL and management role could be collocated on one physical node + 2 
data nodes = 3 nodes.


Anyways, this is just to get a feel for what's involved in the process, 
and how Pacemaker would fit the picture, at least from my point of view.


I would recommend all questions related to MySQL Cluster, Replication, 
Multi-Master be directed to the appropriate mailing lists though, and if 
you want to write a resource agent for a Multi-Master setup, by all 
means, do share :)


Hope this helps.

Regards,
Dan
  

- MySQL Master with a DRBD backend (even MySQL docs recommend this type
of setup for some use cases) in which the MySQL instance runs only where
DRBD is primary


I think I know this setup and don't want it now.

  

- MySQL Cluster (nothing to do with Pacemaker, although they can be put
together in a setup)


This would be the next test if I have enough time.

TIA,
Ruzsi

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Infinite fail-count and migration-threshold after node fail-back

2010-11-11 Thread Dan Frincu

Hi,

Pavlos Parissis wrote:

On 11 November 2010 13:04, Dan Frincu  wrote:
  

Hi,

Andrew Beekhof wrote:

On Mon, Oct 11, 2010 at 9:40 AM, Dan Frincu  wrote:


Hi all,

I've managed to make this setup work, basically the issue with a
symmetric-cluster="false" and specifying the resources' location manually
means that the resources will always obey the location constraint, and (as
far as I could see) disregard the rsc_defaults resource-stickiness values.


This definitely should not be the case.
Possibly your stickiness setting is being eclipsed by the combination
of the location constraint scores.
Try INFINITY instead.



I understand your point and I believe also this to be the case, however I've
noticed that by specifying symmetric-cluster="false" for each resource I
need to add 2 location constraints, which overcrowds the config, and if I
want (and hope) to go to a config with multiple servers and resources, each
with specific rules, then also adding location constraints for each resource
is an overhead which I'd rather not include, if possible.



From the documentation [1]
6.2.2. Asymmetrical "Opt-In" Clusters
To create an opt-in cluster, start by preventing resources from
running anywhere by default
crm_attribute --attr-name symmetric-cluster --attr-value false
Then start enabling nodes. The following fragment says that the web
server prefers sles-1, the database prefers sles-2 and both can
failover to sles-3 if their most preferred node fails.

  




  
Example 6.1. Example set of opt-in location constraints

At the moment you have symmetric-cluster=false, you need to add
location constraints in order to get your resources running.
Below is my conf and it works as expected, pbx_service_01 starts on
node-01 and never fails back, in case failed over to node-03 and
node-01 is back on line, due to resource-stickiness="1000", but take a
look at the score in location constraint, very low scores compared to
1000 - I could  have also set it to inf
  
Yes but you don't have groups defined in your setup, having groups means 
the score of each active resource is added. 
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ch-advanced-resources.html#id2220530


For example:

r...@cluster1:~# ptest -sL
Allocation scores:
group_color: all allocation score on cluster1: 0
group_color: all allocation score on cluster2: -100
group_color: virtual_ip_1 allocation score on cluster1: 1000
group_color: virtual_ip_1 allocation score on cluster2: -100
group_color: virtual_ip_2 allocation score on cluster1: 1000
group_color: virtual_ip_2 allocation score on cluster2: 0
group_color: Failover_Alert allocation score on cluster1: 1000
group_color: Failover_Alert allocation score on cluster2: 0
group_color: fs_home allocation score on cluster1: 1000
group_color: fs_home allocation score on cluster2: 0
group_color: fs_mysql allocation score on cluster1: 1000
group_color: fs_mysql allocation score on cluster2: 0
group_color: fs_storage allocation score on cluster1: 1000
group_color: fs_storage allocation score on cluster2: 0
group_color: httpd allocation score on cluster1: 1000
group_color: httpd allocation score on cluster2: 0
group_color: mysqld allocation score on cluster1: 1000
group_color: mysqld allocation score on cluster2: 0
clone_color: ms_drbd_home allocation score on cluster1: 9000
clone_color: ms_drbd_home allocation score on cluster2: -100
clone_color: drbd_home:0 allocation score on cluster1: 1100
clone_color: drbd_home:0 allocation score on cluster2: 0
clone_color: drbd_home:1 allocation score on cluster1: 0
clone_color: drbd_home:1 allocation score on cluster2: 1100
native_color: drbd_home:0 allocation score on cluster1: 1100
native_color: drbd_home:0 allocation score on cluster2: 0
native_color: drbd_home:1 allocation score on cluster1: -100
native_color: drbd_home:1 allocation score on cluster2: 1100
drbd_home:0 promotion score on cluster1: 18100
drbd_home:1 promotion score on cluster2: -100
clone_color: ms_drbd_mysql allocation score on cluster1: 10100
clone_color: ms_drbd_mysql allocation score on cluster2: -100
clone_color: drbd_mysql:0 allocation score on cluster1: 1100
clone_color: drbd_mysql:0 allocation score on cluster2: 0
clone_color: drbd_mysql:1 allocation score on cluster1: 0
clone_color: drbd_mysql:1 allocation score on cluster2: 1100
native_color: drbd_mysql:0 allocation score on cluster1: 1100
native_color: drbd_mysql:0 allocation score on cluster2: 0
native_color: drbd_mysql:1 allocation score on cluster1: -100
native_color: drbd_mysql:1 allocation score on cluster2: 1100
drbd_mysql:0 promotion score on cluster1: 20300
drbd_mysql:1 promotion score on cluster2: -100
clone_color: ms_drbd_storage allocation score on cluster1: 11200
clone_color: ms_drbd_storage allocation score on cluster2: -100
clone_color: drbd_storage:0 allocation score on cluster1:

Re: [Pacemaker] Infinite fail-count and migration-threshold after node fail-back

2010-11-11 Thread Dan Frincu

Hi,

Andrew Beekhof wrote:

On Mon, Oct 11, 2010 at 9:40 AM, Dan Frincu  wrote:
  

Hi all,

I've managed to make this setup work, basically the issue with a
symmetric-cluster="false" and specifying the resources' location manually
means that the resources will always obey the location constraint, and (as
far as I could see) disregard the rsc_defaults resource-stickiness values.



This definitely should not be the case.
Possibly your stickiness setting is being eclipsed by the combination
of the location constraint scores.
Try INFINITY instead.

  
I understand your point and I believe also this to be the case, however 
I've noticed that by specifying symmetric-cluster="false" for each 
resource I need to add 2 location constraints, which overcrowds the 
config, and if I want (and hope) to go to a config with multiple servers 
and resources, each with specific rules, then also adding location 
constraints for each resource is an overhead which I'd rather not 
include, if possible.


Thank you for your reply, I will take it under advisement.

Regards,

Dan

This behavior is not the expected one, in theory, setting
symmetric-cluster="false" should affect whether resources are allowed to run
anywhere by default and the resource-stickiness should lock in place the
resources so they don't bounce from node to node. Again, this didn't happen,
but by setting symmetric-cluster="true", using the same ordering and
collocation constraints and the resource-stickiness, the behavior is the
expected one.

I don't remember seeing anywhere in the docs from clusterlabs.org being
mentioned that the resource-stickiness only works on
symmetric-cluster="true", so for anyone that also stumbles upon this issue,
I hope this helps.

Regards,

Dan

Dan Frincu wrote:


Hi,

Since it was brought to my attention that I should upgrade from
openais-0.80 to a more recent version of corosync, I've done just that,
however I'm experiencing a strange behavior on the cluster.

The same setup was used with the below packages:

# rpm -qa | grep -i "(openais|cluster|heartbeat|pacemaker|resource)"
openais-0.80.5-15.2
cluster-glue-1.0-12.2
pacemaker-1.0.5-4.2
cluster-glue-libs-1.0-12.2
resource-agents-1.0-31.5
pacemaker-libs-1.0.5-4.2
pacemaker-mgmt-1.99.2-7.2
libopenais2-0.80.5-15.2
heartbeat-3.0.0-33.3
pacemaker-mgmt-client-1.99.2-7.2

Now I've migrated to the most recent stable packages I could find (on the
clusterlabs.org website) for RHEL5:

# rpm -qa | grep -i "(openais|cluster|heartbeat|pacemaker|resource)"
cluster-glue-1.0.6-1.6.el5
pacemaker-libs-1.0.9.1-1.el5
pacemaker-1.0.9.1-1.el5
heartbeat-libs-3.0.3-2.el5
heartbeat-3.0.3-2.el5
openaislib-1.1.3-1.6.el5
resource-agents-1.0.3-2.el5
cluster-glue-libs-1.0.6-1.6.el5
openais-1.1.3-1.6.el5

Expected behavior:
- all the resources the in group should go (based on location preference)
to bench1
- if bench1 goes down, resources migrate to bench2
- if bench1 comes back up, resources stay on bench2, unless manually told
otherwise.

On the previous incantation, this worked, by using the new packages, not
so much. Now if bench1 goes down (crm node standby `uname -n`), failover
occurs, but when bench1 comes backup up, resources migrate back, even if
default-resource-stickiness is set, and more than that, 2 drbd block devices
reach infinite metrics, most notably because they try to promote the
resources to a Master state on bench1, but fail to do so due to the resource
being held open (by some process, I could not identify it).

Strangely enough, the resources (drbd) fail to be promoted to a Master
status on bench1, so they fail back to bench2, where they are mounted
(functional), but crm_mon shows:

Migration summary:
* Node bench2.streamwide.ro:
 drbd_mysql:1: migration-threshold=100 fail-count=100
 drbd_home:1: migration-threshold=100 fail-count=100
* Node bench1.streamwide.ro:

 infinite metrics on bench2, while the drbd resources are available

version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by
mockbu...@v20z-x86-64.home.local, 2009-08-29 14:07:55
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r
  ns:1632 nr:1864 dw:3512 dr:3933 al:11 bm:19 lo:0 pe:0 ua:0 ap:0 ep:1
wo:b oos:0
1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r
  ns:4 nr:24 dw:28 dr:25 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r
  ns:4 nr:24 dw:28 dr:85 al:1 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

and mounted

/dev/drbd1 on /home type ext3 (rw,noatime,nodiratime)
/dev/drbd0 on /mysql type ext3 (rw,noatime,nodiratime)
/dev/drbd2 on /storage type ext3 (rw,noatime,nodiratime)

Attached is the hb_report.

Thank you in advance.

Best regards

  

--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania


_

Re: [Pacemaker] 2 node failover cluster + MySQL Master-Master replica setup

2010-11-10 Thread Dan Frincu

Hi,

Attila Ruzsinszky wrote:

Hi,

Please help me!

I want to solve the above task.
I'm a beginner in cluster and MySQL.

MySQL M-M setup is working. When I start cluster sw (pacemaker+openais)
everything seems good. There is a master mysqld on clusterB node and slave
mysqld on clusterA.
  
You're not making sense, first you say MySQL Master-Master, then you 
mention master mysqld on clusterB and slave mysqld on clusterA. So, 
which one is it:
- MySQL Master-Master (or Multi-Master) which can be achieved via MySQL 
Replication

- MySQL Master-Slave, which can be achieved via MySQL Replication as well
- MySQL Master with a DRBD backend (even MySQL docs recommend this type 
of setup for some use cases) in which the MySQL instance runs only where 
DRBD is primary
- MySQL Cluster (nothing to do with Pacemaker, although they can be put 
together in a setup)



If I connect to the common "cluster" IP I can create DB. It is good.

If I put clusterB standby, mysqld always restarts on clusterA.
In message file there are lines which are complain about missing
SUPER and RELOAD privileges. But for which user?

Do I have to make any special pacemaker config for MySQL M-M setup?

Is there a cookbook, HowTo, anything I can use configure MySQL+Pacemaker
for my task?
  

First you need to determine what is it that your looking for.

Regards,

Dan

I tried to get some help on #linux-cluster; #mysql and #mysql-nbd IRC channels.
I got the working M-M setup from #mysql-nbd. The boy is not using pacemaker
at all. :-(

Environment:
Machines are running under VMware. VM guest OS: Novell SLES 11 SP1 +HA 64 bit.

TIA,
Ruzsi


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Corosync using unicast instead of multicast

2010-11-08 Thread Dan Frincu

Hi,

Steven Dake wrote:

On 11/05/2010 01:30 AM, Dan Frincu wrote:

Hi,

Alan Jones wrote:

This question should be on the openais list, however, I happen to know
the answer.
To get up and running quickly you can configure broadcast with the
version you have.


I've done that already, however I was a little concerned as to what
Steven Dake said on the openais mailing list about using broadcast
"Broadcast and redundant ring probably don't work to well together.".

I've also done some testing and saw that the broadcast address used is
255.255.255.255, regardless of what the bindnetaddr network address is,
and quite frankly, I was hoping to see a directed broadcast address.
This wasn't the case, therefore I wonder whether this was the issue that
Steven was referring to, because by using the 255.255.255.255 as a
broadcast address, there is the slight chance that some application
running in the same network might send a broadcast packet using the same


This can happen with multicast or unicast modes as well.  If a third 
party application communicates on the multicast/port combo or unicast 
port of a cluster node, there is conflict.


With encryption, corosync encrypts and authenticates all packets, 
ignoring packets without a proper signature.  The signatures are 
difficult to spoof.  Without encryption, bad things happen in this 
condition.


For more details, read "SECURITY" file in our source distribution.

OK, I read the SECURITY file, a lot of overhead is added, I understand 
the reasons why it does it this way, not going to go into the details 
right now. Basically enabling encryption ensures that any traffic going 
between the nodes is both encrypted and authenticated, so rogue messages 
that happen to reach the exact network socket will be discarded. I'll 
come back to this a little bit later.


Then again, I have this sentence in my head that I can't seem to get rid 
of "Broadcast and redundant ring probably don't work to well together, 
broadcast and redundant ring probably don't work to well together" 
and also I read "OpenAIS now provides broadcast network communication in 
addition to multicast. This functionality is considered Technology 
Preview for standalone usage of OpenAIS", therefore I'm a little bit 
more concerned.


Can you shed some light on this please? Two questions:

1) What do you mean by "Broadcast and redundant ring probably don't work 
to well together"?


2) Is using Corosync's broadcast feature instead of multicast stable 
enough to be used in production systems?


Thank you in advance.

Best regards,

Dan

port as configured on the cluster. How would the cluster react to that,
would it ignore the packet, would it wreak havoc?

Regards,

Dan

That's my main concern right now.

Corosync can distinguish separate clusters with the multicast address
and port that become payload to the messages.
The patch you referred to can be applied to the top of tree for
corosync or you can wait for a new release 1.3.0 planned for the end
of November.
Alan

On Thu, Nov 4, 2010 at 1:02 AM, Dan Frincu  
wrote:



Hi all,

I'm having an issue with a setup using the following:
cluster-glue-1.0.6-1.6.el5.x86_64.rpm
cluster-glue-libs-1.0.6-1.6.el5.x86_64.rpm
corosync-1.2.7-1.1.el5.x86_64.rpm
corosynclib-1.2.7-1.1.el5.x86_64.rpm
drbd83-8.3.2-6.el5_3.x86_64.rpm
kmod-drbd83-8.3.2-6.el5_3.x86_64.rpm
openais-1.1.3-1.6.el5.x86_64.rpm
openaislib-1.1.3-1.6.el5.x86_64.rpm
pacemaker-1.0.9.1-1.el5.x86_64.rpm
pacemaker-libs-1.0.9.1-1.el5.x86_64.rpm
resource-agents-1.0.3-2.el5.x86_64.rpm

This is a two-node HA cluster, with the nodes interconnected via 
bonded
interfaces through the switch. The issue is that I have no control 
of the
switch itself, can't do anything about that, and from what I 
understand the

environment doesn't allow enabling multicast on the switch. In this
situation, how can I have the setup functional (with redundant rings,
rrp_mode: active) without using multicast.

I've seen that individual network sockets are formed between nodes, 
unicast
sockets, as well as the multicast sockets. I'm interested in 
knowing how

will the lack of multicast affect the redundant rings, connectivity,
failover, etc.

I've also seen this page
https://lists.linux-foundation.org/pipermail/openais/2010-October/015271.html 


And here it states using UDPU transport mode avoids using multicast or
broadcast, but it's a patch, is this integrated in any of the newer 
versions

of corosync?

Thank you in advance.

Regards,

Dan

--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list:Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home:http://www.clusterlabs.org
Getting 
started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs:
ht

Re: [Pacemaker] Corosync using unicast instead of multicast

2010-11-05 Thread Dan Frincu

Hi,

Alan Jones wrote:

This question should be on the openais list, however, I happen to know
the answer.
To get up and running quickly you can configure broadcast with the
version you have.
  
I've done that already, however I was a little concerned as to what 
Steven Dake said on the openais mailing list about using broadcast 
"Broadcast and redundant ring probably don't work to well together.". 

I've also done some testing and saw that the broadcast address used is 
255.255.255.255, regardless of what the bindnetaddr network address is, 
and quite frankly, I was hoping to see a directed broadcast address. 
This wasn't the case, therefore I wonder whether this was the issue that 
Steven was referring to, because by using the 255.255.255.255 as a 
broadcast address, there is the slight chance that some application 
running in the same network might send a broadcast packet using the same 
port as configured on the cluster. How would the cluster react to that, 
would it ignore the packet, would it wreak havoc?


Regards,

Dan

That's my main concern right now.

Corosync can distinguish separate clusters with the multicast address
and port that become payload to the messages.
The patch you referred to can be applied to the top of tree for
corosync or you can wait for a new release 1.3.0 planned for the end
of November.
Alan

On Thu, Nov 4, 2010 at 1:02 AM, Dan Frincu  wrote:
  

Hi all,

I'm having an issue with a setup using the following:
cluster-glue-1.0.6-1.6.el5.x86_64.rpm
cluster-glue-libs-1.0.6-1.6.el5.x86_64.rpm
corosync-1.2.7-1.1.el5.x86_64.rpm
corosynclib-1.2.7-1.1.el5.x86_64.rpm
drbd83-8.3.2-6.el5_3.x86_64.rpm
kmod-drbd83-8.3.2-6.el5_3.x86_64.rpm
openais-1.1.3-1.6.el5.x86_64.rpm
openaislib-1.1.3-1.6.el5.x86_64.rpm
pacemaker-1.0.9.1-1.el5.x86_64.rpm
pacemaker-libs-1.0.9.1-1.el5.x86_64.rpm
resource-agents-1.0.3-2.el5.x86_64.rpm

This is a two-node HA cluster, with the nodes interconnected via bonded
interfaces through the switch. The issue is that I have no control of the
switch itself, can't do anything about that, and from what I understand the
environment doesn't allow enabling multicast on the switch. In this
situation, how can I have the setup functional (with redundant rings,
rrp_mode: active) without using multicast.

I've seen that individual network sockets are formed between nodes, unicast
sockets, as well as the multicast sockets. I'm interested in knowing how
will the lack of multicast affect the redundant rings, connectivity,
failover, etc.

I've also seen this page
https://lists.linux-foundation.org/pipermail/openais/2010-October/015271.html
And here it states using UDPU transport mode avoids using multicast or
broadcast, but it's a patch, is this integrated in any of the newer versions
of corosync?

Thank you in advance.

Regards,

Dan

--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Corosync using unicast instead of multicast

2010-11-04 Thread Dan Frincu

Hi all,

I'm having an issue with a setup using the following:
cluster-glue-1.0.6-1.6.el5.x86_64.rpm
cluster-glue-libs-1.0.6-1.6.el5.x86_64.rpm
corosync-1.2.7-1.1.el5.x86_64.rpm
corosynclib-1.2.7-1.1.el5.x86_64.rpm
drbd83-8.3.2-6.el5_3.x86_64.rpm
kmod-drbd83-8.3.2-6.el5_3.x86_64.rpm
openais-1.1.3-1.6.el5.x86_64.rpm
openaislib-1.1.3-1.6.el5.x86_64.rpm
pacemaker-1.0.9.1-1.el5.x86_64.rpm
pacemaker-libs-1.0.9.1-1.el5.x86_64.rpm
resource-agents-1.0.3-2.el5.x86_64.rpm

This is a two-node HA cluster, with the nodes interconnected via bonded 
interfaces through the switch. The issue is that I have no control of 
the switch itself, can't do anything about that, and from what I 
understand the environment doesn't allow enabling multicast on the 
switch. In this situation, how can I have the setup functional (with 
redundant rings, rrp_mode: active) without using multicast.


I've seen that individual network sockets are formed between nodes, 
unicast sockets, as well as the multicast sockets. I'm interested in 
knowing how will the lack of multicast affect the redundant rings, 
connectivity, failover, etc.


I've also seen this page
https://lists.linux-foundation.org/pipermail/openais/2010-October/015271.html
And here it states using UDPU transport mode avoids using multicast or 
broadcast, but it's a patch, is this integrated in any of the newer 
versions of corosync?


Thank you in advance.

Regards,

Dan

--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] drbd on heartbeat links

2010-11-02 Thread Dan Frincu



Dan Frincu wrote:

Hi,

Pavlos Parissis wrote:

Hi,

I am trying to figure out how I can resolve the following scenario

Facts
3 nodes
2 DRBD ms resource
2 group resource
by default drbd1/group1 runs on node-01 and drbd2/group2 runs on node2
drbd1/group1  can only run on node-01 and node-03
drbd2/group2  can only run on node-02 and node-03
DRBD fencing_policy is resource-only [1]
2 heartbeat links and one of them used by DRBD communication

Scenario
1) node-01 loses both heartbeat links
2) DRBD monitor detects first the absence of the drbd communication
and does resource fencing by add location constraint which prevent
drbd1 to run on node3
3) pacemaker fencing kicks in and kills node-01

due to location constraint created at step 2, drbd1/group1 can run in
the cluster

  
I don't understand exactly what you mean by this. Resource-only 
fencing would create a -inf score on node1 when the node loses the 
drbd communication channel (the only one drbd uses), however you could 
still have heartbeat communication available via the secondary link, 
then you shouldn't fence the entire node, 

Correction
the resource-only fencing does that for you, 
The resource-only fencing restricts a drbd resource from running on that 
node (puts it in unconfigured state, no primary/secondary, no 
replication), it doesn't fence the node, don't want to be misunderstood.
the only thing you need to do is to add the drbd fence handlers in 
/etc/drbd.conf.

   handlers {
   fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
   after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
   }

Is this what you meant?

Regards,

Dan


Any ideas?

Cheers,
Pavlos




[1] it is not resource-and-stonith because in the scenario where a
node has the role of primary for drbd1 and secondary for drbd2, could
be fenced because the primary node of drbd2 have in fencing_policy
resource-and-stonith

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: 
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker 

  




--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] drbd on heartbeat links

2010-11-02 Thread Dan Frincu

Hi,

Pavlos Parissis wrote:

Hi,

I am trying to figure out how I can resolve the following scenario

Facts
3 nodes
2 DRBD ms resource
2 group resource
by default drbd1/group1 runs on node-01 and drbd2/group2 runs on node2
drbd1/group1  can only run on node-01 and node-03
drbd2/group2  can only run on node-02 and node-03
DRBD fencing_policy is resource-only [1]
2 heartbeat links and one of them used by DRBD communication

Scenario
1) node-01 loses both heartbeat links
2) DRBD monitor detects first the absence of the drbd communication
and does resource fencing by add location constraint which prevent
drbd1 to run on node3
3) pacemaker fencing kicks in and kills node-01

due to location constraint created at step 2, drbd1/group1 can run in
the cluster

  
I don't understand exactly what you mean by this. Resource-only fencing 
would create a -inf score on node1 when the node loses the drbd 
communication channel (the only one drbd uses), however you could still 
have heartbeat communication available via the secondary link, then you 
shouldn't fence the entire node, the resource-only fencing does that for 
you, the only thing you need to do is to add the drbd fence handlers in 
/etc/drbd.conf.

   handlers {
   fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
   after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
   }

Is this what you meant?

Regards,

Dan


Any ideas?

Cheers,
Pavlos




[1] it is not resource-and-stonith because in the scenario where a
node has the role of primary for drbd1 and secondary for drbd2, could
be fenced because the primary node of drbd2 have in fencing_policy
resource-and-stonith

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] cloned IPaddr2 on 4 nodes

2010-10-29 Thread Dan Frincu

Hi,

Vladimir Legeza wrote:

Hello,

On Fri, Oct 29, 2010 at 12:35 PM, Dan Frincu <mailto:dfri...@streamwide.ro>> wrote:


Hi,


Vladimir Legeza wrote:

/Hello folks.

I try to setup four ip balanced nodes but,  I didn't found the
right way to balance load between nodes when some of them are filed.

I've done:/

[r...@node1 ~]# crm configure show
node node1
node node2
node node3
node node4
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="10.138.10.252" cidr_netmask="32"
clusterip_hash="sourceip-sourceport" \
op monitor interval="30s"
clone StreamIP ClusterIP \
meta globally-unique="true" *clone-max="8"
clone-node-max="2"* target-role="Started" notify="true"
ordered="true" interleave="true"
property $id="cib-bootstrap-options" \
dc-version="1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438" \
cluster-infrastructure="openais" \
expected-quorum-votes="4" \
no-quorum-policy="ignore" \
stonith-enabled="false"

/When all the nodes are up and running:/

 [r...@node1 ~]# crm status

Last updated: Thu Oct 28 17:26:13 2010
Stack: openais
Current DC: node2 - partition with quorum
Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438
4 Nodes configured, 4 expected votes
2 Resources configured.


Online: [ node1 node2 node3 node4 ]

 Clone Set: StreamIP (unique)
 ClusterIP:0(ocf::heartbeat:IPaddr2):Started node1
 ClusterIP:1(ocf::heartbeat:IPaddr2):Started node1
 ClusterIP:2(ocf::heartbeat:IPaddr2):Started node2
 ClusterIP:3(ocf::heartbeat:IPaddr2):Started node2
 ClusterIP:4(ocf::heartbeat:IPaddr2):Started node3
 ClusterIP:5(ocf::heartbeat:IPaddr2):Started node3
 ClusterIP:6(ocf::heartbeat:IPaddr2):Started node4
 ClusterIP:7(ocf::heartbeat:IPaddr2):Started node4
/
Everything is OK and each node takes 1/4 of all traffic - wonderfull.
But we become to 25% traffic loss if one of them goes down:
/

Isn't this supposed to be normal behavior in a load balancing
situation, 4 nodes receive 25% of traffic each, one node goes
down, the load balancer notices the failure and directs 33,33% of
traffic to the remaining nodes?

 
The only way I see to achive 33...% is to decrease  *clone-max *param 
value (that should be multiple of online nodes number)

also *clone-max *should be changed on the fly (automaticly).

hmm... Idea is very interesting. =8- )
*
*

Just out of curiosity.


[r...@node1 ~]# crm node standby node1
[r...@node1 ~]# crm status

Last updated: Thu Oct 28 17:30:01 2010
Stack: openais
Current DC: node2 - partition with quorum
Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438
4 Nodes configured, 4 expected votes
2 Resources configured.


Node node1: standby
Online: [ node2 node3 node4 ]

 Clone Set: StreamIP (unique)
* ClusterIP:0(ocf::heartbeat:IPaddr2):Stopped
 ClusterIP:1(ocf::heartbeat:IPaddr2):Stopped *
 ClusterIP:2(ocf::heartbeat:IPaddr2):Started node2
 ClusterIP:3(ocf::heartbeat:IPaddr2):Started node2
 ClusterIP:4(ocf::heartbeat:IPaddr2):Started node3
 ClusterIP:5(ocf::heartbeat:IPaddr2):Started node3
 ClusterIP:6(ocf::heartbeat:IPaddr2):Started node4
 ClusterIP:7(ocf::heartbeat:IPaddr2):Started node4

/I found the solution (to prevent loosing) by set *clone-node-max
*to* 3*/

[r...@node1 ~]# crm resource meta StreamIP set clone-node-max 3
[r...@node1 ~]# crm status

Last updated: Thu Oct 28 17:35:05 2010
Stack: openais
Current DC: node2 - partition with quorum
Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438
4 Nodes configured, 4 expected votes
2 Resources configured.


*Node node1: standby*
Online: [ node2 node3 node4 ]

 Clone Set: StreamIP (unique)
* ClusterIP:0(ocf::heartbeat:IPaddr2):Started node2
 ClusterIP:1(ocf::heartbeat:IPaddr2):Started node3*
 ClusterIP:2(ocf::heartbeat:IPaddr2):Started node2
 ClusterIP:3(ocf::heartbeat:IPaddr2):Started node2
 ClusterIP:4(ocf::heartbeat:IPaddr2):Started node3
 ClusterIP:5(ocf::heartbeat:IPaddr2):Started node3
 ClusterIP:6(ocf::heartbeat:IPaddr2):Started node4
 ClusterIP:7(ocf::heartbeat:IPaddr2):Started node4

/The problem is that nothing gonna changed when node1 back online./

   

Re: [Pacemaker] cloned IPaddr2 on 4 nodes

2010-10-29 Thread Dan Frincu
  ClusterIP:6(ocf::heartbeat:IPaddr2):Started node4
 ClusterIP:7(ocf::heartbeat:IPaddr2):Started node4
/
There are NO TRAFFIC on node1.
If I back clone-node-max to 2  - all nodes revert to the original state./

 

So, My question is How to avoid such "hand-made" changes ( or is it 
possible to automate/* clone-node-max*/ adjustments)?


Thanks!

You could use location constraints for the clones, something like:

location StreamIP:0 200: node1
location StreamIP:0 100: node2

This way if node1 is up, it will run there, but if node1 fails it will 
move to node2. And if you don't define resource stickiness, when node1 
comes back online, the resource migrates back to it.


I haven't tested this, but it should give you a general idea about how 
it could be implemented.


Regards,

Dan




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Multiple independent two-node clusters side-by-side?

2010-10-29 Thread Dan Frincu

Hi,

Vadym Chepkov wrote:

On Oct 28, 2010, at 2:53 AM, Dan Frincu wrote:

  

Hi,

Andreas Ntaflos wrote:

Hi, 

first time poster, short time Pacemaker user. I don't think this is a 
very difficult question to answer but I seem to be feeding Google the 
wrong search terms. I am using Pacemaker 1.0.8 and Corosync 1.2.0 on 
Ubuntu 10.04.1 Server.


Short version: How do I configure multiple independent two-node clusters 
where the nodes are all on the same subnet? Only the two nodes that form 
the cluster should see that cluster's resources and not any other. 


Is this possible? Where should I look for more and detailed information?
  
  

You need to specify different multicast sockets for this to work. Under the 
/etc/corosync/corosync.conf you have the interface statements. Even if all servers are in 
the same subnet, you can "split them apart" by defining unique multicast 
sockets.
An example should be useful. Let's say that you have only one interface 
statement in the corosync file.
interface {
ringnumber: 0
bindnetaddr: 192.168.1.0 
mcastaddr: 239.192.168.1 
mcastport: 5405 
}

The multicast socket in this case is 239.192.168.1:5405. All nodes that should 
be in the same cluster should use the same multicast socket. In your case, the 
first two nodes should use the same multicast socket. How about the other two 
nodes? Use another unique multicast socket.
interface {
ringnumber: 0
bindnetaddr: 192.168.1.0 
mcastaddr: 239.192.168.112 
mcastport: 5405 
}

Now the multicast socket is 239.192.168.112:5405. It's unique, the network 
address is the same, but you add this config (edit according to your 
environment, this is just an example) to your other two nodes. So you have 
cluster1 formed out of node1 and node2 linked to 239.192.168.1:5405 and 
cluster2 formed out of node3 and node4 linked to 239.192.168.112:5405.

This way, the clusters don't _see_ each other, so you can reuse the resource 
ID's and see only two nodes per cluster.




Out of curiosity, RFC2365 defines "local scope" multicast space 239.255.0.0/16 and 
"organizational local scope" 239.192.0.0/14.

Seems most examples for pacemaker cluster use later. But since most clusters 
are not spread across different subnets, wouldn't it be more appropriate to use 
the former?

Thanks,
Vadym

  
You do realize that 239.0.0.0/8 has the same general purpose as RFC1918, 
only it references multicast addresses instead. Basically general 
guidelines dictate usage of 239.255.0.0/16 locally scoped address range 
(e.g.: all nodes are in the same general location, such as a building), 
but this just as saying use 192.168.0.0/16 instead of 10.0.0.0/16. It 
really boils down to the network engineer's choice of addressing, either 
solution works, but this kind of an elaborate multicast addressing 
scheme design implies also a large number of nodes in many locations, 
all under the same general administration.


Thinking that for each 2 node cluster with one communication channel you 
need one multicast address, and that you can put many nodes in the same 
cluster (where such should arise), the number of multicast addresses is 
usually small, so it makes little difference whether you choose from a 
2^16 range or from a 2^24 range of multicast addresses.


Going to another level with this, imagine your using vlan's for each 
cluster, all of a sudden, you can use the same multicast address :)


The main concern in this case should pertain less to the addressing 
scheme and more to the interconnecting devices' support for multicast.


Just my 2 cents.

Regards,

Dan

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Multiple independent two-node clusters side-by-side?

2010-10-27 Thread Dan Frincu

Hi,

Andreas Ntaflos wrote:
Hi, 

first time poster, short time Pacemaker user. I don't think this is a 
very difficult question to answer but I seem to be feeding Google the 
wrong search terms. I am using Pacemaker 1.0.8 and Corosync 1.2.0 on 
Ubuntu 10.04.1 Server.


Short version: How do I configure multiple independent two-node clusters 
where the nodes are all on the same subnet? Only the two nodes that form 
the cluster should see that cluster's resources and not any other. 


Is this possible? Where should I look for more and detailed information?
  
You need to specify different multicast sockets for this to work. Under 
the /etc/corosync/corosync.conf you have the interface statements. Even 
if all servers are in the same subnet, you can "split them apart" by 
defining unique multicast sockets.
An example should be useful. Let's say that you have only one interface 
statement in the corosync file.

   interface {
   ringnumber: 0
   bindnetaddr: 192.168.1.0
   mcastaddr: 239.192.168.1
   mcastport: 5405
   }
The multicast socket in this case is 239.192.168.1:5405. All nodes that 
should be in the same cluster should use the same multicast socket. In 
your case, the first two nodes should use the same multicast socket. How 
about the other two nodes? Use another unique multicast socket.

   interface {
   ringnumber: 0
   bindnetaddr: 192.168.1.0
   mcastaddr: 239.192.168.112
   mcastport: 5405
   }
Now the multicast socket is 239.192.168.112:5405. It's unique, the 
network address is the same, but you add this config (edit according to 
your environment, this is just an example) to your other two nodes. So 
you have cluster1 formed out of node1 and node2 linked to 
239.192.168.1:5405 and cluster2 formed out of node3 and node4 linked to 
239.192.168.112:5405.


This way, the clusters don't _see_ each other, so you can reuse the 
resource ID's and see only two nodes per cluster.


Regards,

Dan



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  


--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Add a color scheme to the editor used in crm shell

2010-10-25 Thread Dan Frincu

Hi,

As a person who spends quite a lot of time in the crm shell I have seen 
that there is a colorscheme option that can be applied when issuing crm 
configure show. I'm interested if there's a way to have a colorscheme 
within the editor that is used by crm. It's usually vim, I've noticed 
that I could add my .vimrc and when using the crm configure edit, the 
shortcuts and everything else worked, except for the colorscheme.


Does anyone know how to add a colorscheme to the crm editor (vim for 
example) that can also do syntax highlighting while inside the crm 
configure edit?


Regards,

Dan

--
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


  1   2   >