[Linux-HA] Antw: Re: Q: How does crm locate RAs?

2011-07-07 Thread Ulrich Windl
Florian Haas florian.h...@linbit.com schrieb am 06.07.2011 um 21:51 in Nachricht 4e14bca8.7070...@linbit.com: (Your MUA seems to have injected = in the toward the end of most lines. May want to have a look at fixing that.) On 07/06/2011 05:40 PM, Ulrich Windl wrote: Hi! As I've

Re: [Linux-HA] stonith-ng reboot returned 1

2011-07-07 Thread Lars Marowsky-Bree
On 2011-07-06T15:06:01, Craig Lesle craig.le...@bruden.com wrote: Interesting that st_timeout does not show 75 seconds on any try and looks rather random, like it's calculated. ... right. I hadn't noticed that before. So what's happening is that, in pacemaker's fencing/remote.c, the

Re: [Linux-HA] Antw: Re: Q: How does crm locate RAs?

2011-07-07 Thread Florian Haas
On 2011-07-07 08:23, Ulrich Windl wrote: Florian Haas florian.h...@linbit.com schrieb am 06.07.2011 um 21:51 in Nachricht 4e14bca8.7070...@linbit.com: (Your MUA seems to have injected = in the toward the end of most lines. May want to have a look at fixing that.) On 07/06/2011 05:40 PM,

[Linux-HA] iscsi not configured

2011-07-07 Thread spamvoll
hi.. im new and setting up my first HA Cluster. Corosync and Pacemaker running fine and all IPs get moved, drbd works but iscsi drives me crazy my config: ... primitive iscsiLUN ocf:heartbeat:iSCSILogicalUnit \ params path=/dev/drbd0 target_iqn=iqn.2011-06.de.my-domain.viki:drbd0 lun=0

[Linux-HA] Forkbomb not initiating failover

2011-07-07 Thread James Smith
Hi, Summary: Two node cluster running DRBD, IET with a floating IP and stonith enabled. All this works well, I can kernel panic the machine, kill individual PIDs (for example IET) which then invoke failover. However, when I forkbomb the master, nothing happens. The box is dead, the services

Re: [Linux-HA] iscsi not configured

2011-07-07 Thread Michael Schwartzkopff
hi.. im new and setting up my first HA Cluster. Corosync and Pacemaker running fine and all IPs get moved, drbd works but iscsi drives me crazy my config: ... primitive iscsiLUN ocf:heartbeat:iSCSILogicalUnit \ params path=/dev/drbd0

Re: [Linux-HA] Forkbomb not initiating failover

2011-07-07 Thread Florian Haas
On 2011-07-07 11:59, James Smith wrote: Hi, Summary: Two node cluster running DRBD, IET with a floating IP and stonith enabled. All this works well, I can kernel panic the machine, kill individual PIDs (for example IET) which then invoke failover. However, when I forkbomb the master,

Re: [Linux-HA] iscsi not configured

2011-07-07 Thread Florian Haas
On 2011-07-07 11:48, spamv...@googlemail.com wrote: hi.. im new and setting up my first HA Cluster. Corosync and Pacemaker running fine and all IPs get moved, drbd works but iscsi drives me crazy my config: ... primitive iscsiLUN ocf:heartbeat:iSCSILogicalUnit \ params

Re: [Linux-HA] stonith-ng reboot returned 1

2011-07-07 Thread Craig Lesle
... right. I hadn't noticed that before. So what's happening is that, in pacemaker's fencing/remote.c, the stonith-timeout specified is divided up in 10% for _querying_ the list of nodes a given stonith device can retrieve, and 90% for then performing an actual operation. (Compare

Re: [Linux-HA] Forkbomb not initiating failover

2011-07-07 Thread James Smith
Hi, I appreciate that, but it doesn't answer the question. What I'm getting at, is there are multiple scenarios where a system can fail but in my test scenario I was forcing high load. My application wouldn't, in a working scenario, ever cause this type of load unless there was a very

Re: [Linux-HA] Forkbomb not initiating failover

2011-07-07 Thread Florian Haas
On 2011-07-07 13:52, James Smith wrote: Hi, I appreciate that, but it doesn't answer the question. Then maybe I misunderstood the question. I had interpreted it to mean why doesn't my cluster automatically fail over under high load? -- perhaps you can rephrase to clarify. What I'm getting

Re: [Linux-HA] stonith-ng reboot returned 1

2011-07-07 Thread Lars Marowsky-Bree
On 2011-07-07T05:40:23, Craig Lesle craig.le...@bruden.com wrote: Interesting. It would seem more intuitive for remote.c to add 10% to the specified value in order to get it's querying overhead accounted for. Now that I know about the query tax, will verify stonith-timeout is set to a

Re: [Linux-HA] stonith-ng reboot returned 1

2011-07-07 Thread Andrew Beekhof
On Thu, Jul 7, 2011 at 5:40 PM, Lars Marowsky-Bree l...@suse.de wrote: On 2011-07-06T15:06:01, Craig Lesle craig.le...@bruden.com wrote: Interesting that st_timeout does not show 75 seconds on any try and looks rather random, like it's calculated. ... right. I hadn't noticed that before.

Re: [Linux-HA] ERROR: glib: ucast: error binding socket. Retrying: Address already in use

2011-07-07 Thread Andrew Beekhof
There is some way to tell the system not to hand out 696 for use by other daemons. Its been a long time since I did it though so I forget the details (even who is handing it out, possibly rpc). On Thu, Jul 7, 2011 at 10:16 AM, Hai Tao taoh...@hotmail.com wrote: I got this error (ERROR: glib: