Thanks for your answer, It will confirm all of my doubts

I've tried this yesterday and the 2 nodes was unpowered instantly (this cause some troubles on reboot with FS unmounted uncleanly), is there a way to do a clean shutdown instead of a poweroff ?

After some reflexion, I've decided to put a third node (simple workstation) as an arbiter with only fencing primitives on it. Is that a good idea ? Is this solution reliable ?

Regards,
Bruno

Le 25/07/2013 16:53, Digimer a écrit :
With two-node clusters, quorum can't be used. This is fine *if* you have good fencing. If the nodes partition (ie: network failure), both will try to fence the other. In theory, the faster node will power off the other node before the slower node can kill the faster node. In practice, this isn't always the case.

IPMI (and iDRAC, etc) are independent devices. So it is possible for both nodes to initiate a power-down on the other before either dies. To avoid this, you will want to set a delay for the primary/active node's fence primitive.

Say "node1" is your active node and "node2" is your backup. You would set a delay of, say, 15 seconds against "node1". Now if there is a partition, node1 would look up how to fence node2 and immediately initiate power off. Node 2, however, would look up how to fence node1, see a 15 second delay, and start a timer before calling the power-off. Of course, node2 will die before the timer expires.

You can also disabled acpid on the nodes, too. With that disabled, "pressing the power button" will result in a near-instant off. If you do this, reducing your delay to 5 seconds would probably be plenty.

There is another issue to be aware of; "Fence loops". The problem with two node clusters and not using quorum is that a single node can fence the other. So lets continue our example above...

Node 2 will eventually reboot. If you have pacemaker set to start on boot, it will start, wait to connect to node1 (which it can't because the network failure remains), call a fence to put node1 into a known state, pause for 15 seconds and then initiate a power off. Node 1 dies and the services recover on Node 2. Now, node1 boots back up, starts it's pacemaker.... Endless loop of fence -> recover until the network is fixed.

To avoid this, simple do not start pacemaker on boot.

As to the specifics, you can test fencing configurations easily by directly calling the fence agent at the command line. I do not use DRAC, so I can't speak to specifics. I think you need to set lanplus and possibly define the console prompt to expect.

Using a generic IPMI as an example;

fence_ipmilan -a 192.168.100.1 -l ipmiuser -p ipmipwd -o status
fence_ipmilan -a 192.168.100.2 -l ipmiuser -p ipmipwd -o status

If this returns the power state, then it is simple to convert to a pacemaker config.

configure primitive pStN1 stonith:fence_ipmilan params \
 ipaddr=192.168.100.1 login=ipmiuser passwd=ipmipwd delay=15 \
 op monitor interval=60s
configure primitive pStN2 stonith:fence_ipmilan params \
 ipaddr=192.168.100.2 login=ipmiuser passwd=ipmipwd \
 op monitor interval=60s

Again, I *think* you need to set a couple extra options for DRAC. Experiment at the command line before moving to the pacemaker config. Once you have the command line version working, you should be able to set it up in pacemaker. If you have trouble though, share the CLI call and we can help with the pacemaker config.

On 25/07/13 05:39, Bruno MACADRÉ wrote:
Some modifications about my first mail :

After some researches I found that external/ipmi isn't available on my
system, so I must use fence-agents.

My second question must be modified to relfect this changes like this :

     configure primitive pStN1 stonith:fence_ipmilan params
ipaddr=192.168.100.1 login=ipmiuser passwd=ipmipwd
     configure primitive pStN2 stonith:fence_ipmilan params
ipaddr=192.168.100.2 login=ipmiuser passwd=ipmipwd

Regards,
Bruno

Le 25/07/2013 10:39, Bruno MACADRÉ a écrit :
Hi,

    I've just made a two-nodes Active/Passive cluster to have an iSCSI
Failover SAN.

    Some details about my configuration :

        - I've two nodes with 2 bonds : 1 for DRBD replication and 1
for communication
        - iSCSI Target, iSCSI Lun and VirtualIP are constraints
together to start on Master DRBD node

    All work fine, but now, I need to configure fencing. I've 2 DELL
PowerEdge servers with iDRAC6.

    First question, is 'external/drac5' compatible with iDrac6 (I've
read all and nothing about this...) ?

    Second question, is that configuration sufficient (with ipmi) ?

        configure primitive pStN1 stonith:external/ipmi hostname=node1
ipaddr=192.168.100.1 userid=ipmiuser passwd=ipmipwd interface=lan
        configure primitive pStN2 stonith:external/ipmi hostname=node2
ipaddr=192.168.100.2 userid=ipmiuser passwd=ipmipwd interface=lan
        location lStN1 pStN1 inf: node1
        location lStN2 pStN2 inf: node2

        And after all :
        configure property stonith-enabled=true
        configure property stonith-action=poweroff

    Third (and last) question, what about quorum ? At the moment I've
'no-quorum-policy="ignore"' but it's a risk isn't it ?

    Don't hesitate to request me for more information if needed,

    Regards,
    Bruno.





--

Bruno MACADRE
-------------------------------------------------------------------
 Ingénieur Systèmes et Réseau     | Systems and Network Engineer
 Département Informatique         | Department of computer science
 Responsable Info SER             | SER IT Manager
 Université de Rouen              | University of Rouen
-------------------------------------------------------------------
Coordonnées / Contact :
        Université de Rouen
        Faculté des Sciences et Techniques - Madrillet
        Avenue de l'Université - BP12
        76801 St Etienne du Rouvray CEDEX
        FRANCE

        Tél : +33 (0)2-32-95-51-86
-------------------------------------------------------------------


_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to