Re: [Linux-HA] Need HA Help - standby / online not switching automatically

2011-05-20 Thread Randy Katz
Hi Lars,

Thank you for the tools to look at things, however, on a whim before 
getting into them as
DRBD was looking fine in that scenario I decided to just run through the 
install on a different
pair of VMs and making sure I used the gitco.de repository when it came 
to drbd83 and the
clusterlabs repo for pacemaker (heartbeat and everything comes with it 
once the libesmtp
requirement is settled, in this case by using a later epel install: rpm 
-ivH epel-release-5-4.noarch.rpm):

So using the exact same configuration in crm except standby is "off" on 
both VMs of course, when
I do the same crm node standby on one the other takes over and then back 
again, no problem. I am
going to go back and either reinstall the other and/or compare each and 
every rpm and source to see
which is broken or just store my install procedure.

Now off to learn what you mentioned about crm resource move, thanks again.

Regards,
Randy


On 5/20/2011 1:03 AM, Lars Ellenberg wrote:
> On Thu, May 19, 2011 at 11:53:24PM -0700, Randy Katz wrote:
>> Lars,
>>
>> Thank you much for the answer on the "standby" issue.
>> It seems that that was the tip of my real issue. So now I have both nodes
>> coming online. And it seems ha1 starts fine with all the resources starting.
>>
>> With them both online if I issue the: crm mode standby ha1.iohost.com
> Why.
>
> Learn about "crm resource move".
> (and unmove, for that matter).
>
>> Then I see IP Takeover on ha2 but the other resources do not start,
>> ever, it remains:
>>
>> Node ha1.iohost.com (b159178d-c19b-4473-aa8e-13e487b65e33): standby
>> Online: [ ha2.iohost.com ]
>>
>>Resource Group: WebServices
>>ip1(ocf::heartbeat:IPaddr2):   Started ha2.iohost.com
>>ip1arp (ocf::heartbeat:SendArp):   Started ha2.iohost.com
>>fs_webfs   (ocf::heartbeat:Filesystem):Stopped
>>fs_mysql   (ocf::heartbeat:Filesystem):Stopped
>>apache2(lsb:httpd):Stopped
>>mysql  (ocf::heartbeat:mysql): Stopped
>>Master/Slave Set: ms_drbd_mysql
>>Slaves: [ ha2.iohost.com ]
>>Stopped: [ drbd_mysql:0 ]
>>Master/Slave Set: ms_drbd_webfs
>>Slaves: [ ha2.iohost.com ]
>>Stopped: [ drbd_webfs:0 ]
>>
>> In looking in the recent log I see this: May 20 12:46:42 ha2.iohost.com
>> pengine: [3117]: info: native_color: Resource fs_webfs cannot run anywhere
>>
>> I am not sure why it cannot promote the other resources on ha2, I
>> checked drbd before putting ha1 on standby and it was up to date.
> Double check the status of drbd:
> # cat /proc/drbd
>
> Check what the cluster would do, and why:
> # ptest -LVVV -s
> [add more Vs to see more detail, but brace yourself for maximum confusion ;-)]
>
> Check for constraints that get in the way:
> # crm configure show | grep -Ee 'location|order'
>
> check the "master scores" in the cib:
> # cibadmin -Ql -o status | grep master
>
> Look at the actions that have been performed on the resource,
> on both nodes:
> vv-- the ID of your primitive
> # grep "lrmd:.*drbd_mysql" /var/log/ha.log
> or wherever that ends up on your box
>
>> Here are the surrounding log entries, the only thing I changed in the
>> config is standby="off" on both nodes:
>>
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: group_print:  
>> Resource Group: WebServices
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print:  
>> ip1  (ocf::heartbeat:IPaddr2):   Started ha2.iohost.com
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print:  
>> ip1arp   (ocf::heartbeat:SendArp):   Started ha2.iohost.com
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print:  
>> fs_webfs (ocf::heartbeat:Filesystem):Stopped
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print:  
>> fs_mysql (ocf::heartbeat:Filesystem):Stopped
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print:  
>> apache2  (lsb:httpd):Stopped
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print:  
>> mysql(ocf::heartbeat:mysql): Stopped
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: clone_print:  
>> Master/Slave Set: ms_drbd_mysql
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print:  
>> Slaves: [ ha2.iohost.com ]
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print:  
>> Stopped: [ drbd_mysql:0 ]
>> May 20 12:

Re: [Linux-HA] Need HA Help - standby / online not switching automatically

2011-05-19 Thread Randy Katz

Regards,
Randy

On 5/19/2011 11:19 PM, Lars Ellenberg wrote:
> On Thu, May 19, 2011 at 03:46:37PM -0700, Randy Katz wrote:
>> To clarify, I was not seeking a quick response. I just noticed the
>> threads I searched
>> were NEVER answered, with the problem that I reported. That being said
>> and about standby:
>>
>> Why does my node come up as standby and not as online?
> Because you put it there.
>
> The standby setting (as a few others) can take a "lifetime",
> and usually that defaults to "forever", though you can explicitly
> specify an "until reboot", which actually means until restart of the
> cluster system on that node.
>
>> Is there a setting in my conf file that affects that?
>> Or another issue, is it configuration, please advise.
>>
>> Thanks,
>> Randy
>>
>> PS - Here are some threads were it seems they were never answered, one
>> going back 3 years ago:
>>
>> http://www.mail-archive.com/linux-ha@lists.linux-ha.org/msg09886.html
>> http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg07663.html
>> http://lists.community.tummy.com/pipermail/linux-ha/2008-August/034310.html
> Then they probably have been solved off list, via IRC or support,
> or by the original user finally having a facepalm experience.
>
> Besides, yes, it happens that threads go unanswered, most of the time
> because the question was badly asked ("does not work. why?"), and those
> that could figure it out have been distracted by more important things,
> or decided that, at that time, trying to figure it out was too time
> consuming.
>
> That's life.
>
> If it happens to you, do a friendly bump,
> and/or try to ask a smarter version of the question ;-)
>
> Most of the time, the answer is in the logs, and the config.
>
> But please break down the issue to a minimal configuration,
> and post that minimal config plust logs of one "incident".
> Don't post your 2 MB xml config, plus a 2G log,
> and expect people to dig through that for fun.
>
> BTW, none of the quoted threads has anything to do with your experience,
> afaiks.
>
>> On 5/19/2011 3:16 AM, Lars Ellenberg wrote:
>>> On Wed, May 18, 2011 at 09:55:00AM -0700, Randy Katz wrote:
>>>> ps - I searched a lot online and I see this issue coming up,
>>> I doubt that _this_ issue comes up that often ;-)
>>>
>>>> and then after about 3-4 emails they request the resources and
>>>> constraints and then there is never an answer to the thread, why?!
>>> Hey, it's not even a day since you provided the config.
>>> People have day jobs.
>>> People get _payed_ to do support on these kinds of things,
>>> so they probably first deal with requests by paying customers.
>>>
>>> If you need SLAs, you may need to check out a support contranct.
>>>
>>> Otherwise you need to be patient.
>>>
>>>
>>>>  From what I read, you probably just have misunderstood some concepts.
>>> "Standby" is not what I think you think it is ;-)
>>>
>>> "Standby" is NOT for deciding where resources will be placed.
>>>
>>> "Standby" is for manually switching a node into a mode where it WILL NOT
>>> run any resources. And it WILL NOT leave that state by itself.
>>> It is not supposed to.
>>>
>>> You switch a node into standby if you want to do maintenance on that
>>> node, do major software, system or hardware upgrades, or otherwise
>>> expect that it won't be useful to run resources there.
>>>
>>> It won't even run DRBD secondaries.
>>> It will run nothing there.
>>>
>>>
>>> If you want automatic failover, DO NOT put your nodes in standby.
>>> Because, if you do, they can not take over resources.
>>>
>>> You have to have your nodes online for any kind of failover to happen.
>>>
>>> If you want to have a "preferred" location for your resources,
>>> use location constraints.
>>>
>>>
>>> Does that help?
>>>
>>>
>>>
>> ___
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Need HA Help - standby / online not switching automatically

2011-05-19 Thread Randy Katz
To clarify, I was not seeking a quick response. I just noticed the 
threads I searched
were NEVER answered, with the problem that I reported. That being said 
and about standby:

Why does my node come up as standby and not as online? Is there a 
setting in my conf file
that affects that? Or another issue, is it configuration, please advise.

Thanks,
Randy

PS - Here are some threads were it seems they were never answered, one 
going back 3 years ago:

http://www.mail-archive.com/linux-ha@lists.linux-ha.org/msg09886.html
http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg07663.html
http://lists.community.tummy.com/pipermail/linux-ha/2008-August/034310.html



On 5/19/2011 3:16 AM, Lars Ellenberg wrote:
> On Wed, May 18, 2011 at 09:55:00AM -0700, Randy Katz wrote:
>> ps - I searched a lot online and I see this issue coming up,
> I doubt that _this_ issue comes up that often ;-)
>
>> and then after about 3-4 emails they request the resources and
>> constraints and then there is never an answer to the thread, why?!
> Hey, it's not even a day since you provided the config.
> People have day jobs.
> People get _payed_ to do support on these kinds of things,
> so they probably first deal with requests by paying customers.
>
> If you need SLAs, you may need to check out a support contranct.
>
> Otherwise you need to be patient.
>
>
> > From what I read, you probably just have misunderstood some concepts.
>
> "Standby" is not what I think you think it is ;-)
>
> "Standby" is NOT for deciding where resources will be placed.
>
> "Standby" is for manually switching a node into a mode where it WILL NOT
> run any resources. And it WILL NOT leave that state by itself.
> It is not supposed to.
>
> You switch a node into standby if you want to do maintenance on that
> node, do major software, system or hardware upgrades, or otherwise
> expect that it won't be useful to run resources there.
>
> It won't even run DRBD secondaries.
> It will run nothing there.
>
>
> If you want automatic failover, DO NOT put your nodes in standby.
> Because, if you do, they can not take over resources.
>
> You have to have your nodes online for any kind of failover to happen.
>
> If you want to have a "preferred" location for your resources,
> use location constraints.
>
>
> Does that help?
>
>
>

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Need HA Help - standby / online not switching automatically

2011-05-18 Thread Randy Katz
ps - I searched a lot online and I see this issue coming up, and then 
after about 3-4 emails they
request the resources and constraints and then there is never an answer 
to the thread, why?!

On 5/17/2011 10:00 PM, Randy Katz wrote:
> Configs as follows, drbd.conf:
>
> global {
>   usage-count no;
>   # minor-count dialog-refresh disable-ip-verification
> }
>
> resource r0 {
>   protocol C;
>   syncer {
>   rate 4M;
>   }
>   startup {
>   wfc-timeout 15;
>   degr-wfc-timeout 60;
>   }
>   net {
>   cram-hmac-alg sha1;
>   shared-secret "vz2700.1";
>   }
>   on ha1.iohost.com {
>   device /dev/drbd0;
>   disk /dev/vg0/mysql;
>   address 10.1.1.197:7788;
>   meta-disk internal;
>   }
>   on ha2.iohost.com {
>   device /dev/drbd0;
>   disk /dev/vg0/mysql;
>   address 10.1.1.187:7788;
>   meta-disk internal;
>   }
> }
> resource r1 {
>   protocol C;
>   syncer {
>   rate 4M;
>   }
>   startup {
>   wfc-timeout 15;
>   degr-wfc-timeout 60;
>   }
>   net {
>   cram-hmac-alg sha1;
>   shared-secret "vz2700.1";
>   }
>   on ha1.iohost.com {
>   device /dev/drbd1;
>   disk /dev/vg0/html;
>   address 10.1.1.197:7789;
>   meta-disk internal;
>   }
>   on ha2.iohost.com {
>   device /dev/drbd1;
>   disk /dev/vg0/html;
>   address 10.1.1.187:7789;
>   meta-disk internal;
>   }
> }
>
> ha.cf:
>
> # Logging
> debug   1
> use_logdfalse
> logfile /var/log/heartbeat.log
> logfacility daemon
>
> # Misc Options
> traditional_compression off
> compression bz2
> coredumps   true
>
> # Communications
> udpport 691
> bcast   eth0
> autojoinany
>
> # Thresholds (in seconds)
> keepalive   1
> warntime6
> deadtime10
> initdead15
> crm yes
>
> crm dump:
>
> node $id="b159178d-c19b-4473-aa8e-13e487b65e33" ha1.iohost.com \
>   attributes standby="off"
> node $id="b7f38306-1cb7-4f8d-8ff2-4332bbea6e78" ha2.iohost.com \
>   attributes standby="on"
> primitive apache2 lsb:httpd \
>   op start interval="0" timeout="60" \
>   op stop interval="0" timeout="120" start-delay="15" \
>   meta target-role="Started"
> primitive drbd_mysql ocf:linbit:drbd \
>   params drbd_resource="r0" \
>   op monitor interval="15s"
> primitive drbd_webfs ocf:linbit:drbd \
>   params drbd_resource="r1" \
>   op monitor interval="15s"
> primitive fs_mysql ocf:heartbeat:Filesystem \
>   params device="/dev/drbd/by-res/r0" directory="/var/lib/mysql"
> fstype="ext3" \
>   op start interval="0" timeout="60" \
>   op stop interval="0" timeout="120" \
>   meta target-role="Started"
> primitive fs_webfs ocf:heartbeat:Filesystem \
>   params device="/dev/drbd/by-res/r1" directory="/var/www/html"
> fstype="ext3" \
>   op start interval="0" timeout="60" \
>   op stop interval="0" timeout="120" \
>   meta target-role="Started"
> primitive ip1 ocf:heartbeat:IPaddr2 \
>   params ip="72.9.147.196" nic="eth0" \
>   op monitor interval="5s"
> primitive ip1arp ocf:heartbeat:SendArp \
>   params ip="72.9.147.196" nic="eth0"
> primitive mysql ocf:heartbeat:mysql \
>   params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf"
> user="mysql" group="mysql" log="/var/log/mysqld.log"
> pid="/var/run/mysqld/mysqld.pid" datadir="/var/li
> b/mysql" socket="/var/lib/mysql/mysql.sock" \
>   op monitor interval="30s" timeout="30s" \
>   op start interval="0" timeout="120" \
>   op stop interval="0" timeout="120"
> group WebServices ip1 ip1

Re: [Linux-HA] HA Nodes Port 691 UDP

2011-05-18 Thread Randy Katz
ok thanks and I see there is 694 reserved for ha-clusters.

On 5/18/2011 7:17 AM, mike wrote:
> On 11-05-18 09:31 AM, Randy Katz wrote:
>> Hi, does anyone on this list know why there are UDP requests on port 691
>> of the HA nodes? I turned on firewalling and my crm_mon would not show
>> both nodes' status until I allowed UDP port 691 to flow through, please
>> advise,
>>
>> Regards,
>> Randy
>> ___
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>>
> What is in your ha.cf file? Communications on my clusters take place
> over port 693. Maybe you set yours to 691??
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] HA Nodes Port 691 UDP

2011-05-18 Thread Randy Katz
Hi, does anyone on this list know why there are UDP requests on port 691
of the HA nodes? I turned on firewalling and my crm_mon would not show
both nodes' status until I allowed UDP port 691 to flow through, please 
advise,

Regards,
Randy
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Need HA Help - standby / online not switching automatically

2011-05-17 Thread Randy Katz
Configs as follows, drbd.conf:

global {
 usage-count no;
 # minor-count dialog-refresh disable-ip-verification
}

resource r0 {
 protocol C;
 syncer {
 rate 4M;
 }
 startup {
 wfc-timeout 15;
 degr-wfc-timeout 60;
 }
 net {
 cram-hmac-alg sha1;
 shared-secret "vz2700.1";
 }
 on ha1.iohost.com {
 device /dev/drbd0;
 disk /dev/vg0/mysql;
 address 10.1.1.197:7788;
 meta-disk internal;
 }
 on ha2.iohost.com {
 device /dev/drbd0;
 disk /dev/vg0/mysql;
 address 10.1.1.187:7788;
 meta-disk internal;
 }
}
resource r1 {
 protocol C;
 syncer {
 rate 4M;
 }
 startup {
 wfc-timeout 15;
 degr-wfc-timeout 60;
 }
 net {
 cram-hmac-alg sha1;
 shared-secret "vz2700.1";
 }
 on ha1.iohost.com {
 device /dev/drbd1;
 disk /dev/vg0/html;
 address 10.1.1.197:7789;
 meta-disk internal;
 }
 on ha2.iohost.com {
 device /dev/drbd1;
 disk /dev/vg0/html;
 address 10.1.1.187:7789;
 meta-disk internal;
 }
}

ha.cf:

# Logging
debug   1
use_logdfalse
logfile /var/log/heartbeat.log
logfacility daemon

# Misc Options
traditional_compression off
compression bz2
coredumps   true

# Communications
udpport 691
bcast   eth0
autojoinany

# Thresholds (in seconds)
keepalive   1
warntime6
deadtime10
initdead15
crm yes

crm dump:

node $id="b159178d-c19b-4473-aa8e-13e487b65e33" ha1.iohost.com \
 attributes standby="off"
node $id="b7f38306-1cb7-4f8d-8ff2-4332bbea6e78" ha2.iohost.com \
 attributes standby="on"
primitive apache2 lsb:httpd \
 op start interval="0" timeout="60" \
 op stop interval="0" timeout="120" start-delay="15" \
 meta target-role="Started"
primitive drbd_mysql ocf:linbit:drbd \
 params drbd_resource="r0" \
 op monitor interval="15s"
primitive drbd_webfs ocf:linbit:drbd \
 params drbd_resource="r1" \
 op monitor interval="15s"
primitive fs_mysql ocf:heartbeat:Filesystem \
 params device="/dev/drbd/by-res/r0" directory="/var/lib/mysql" 
fstype="ext3" \
 op start interval="0" timeout="60" \
 op stop interval="0" timeout="120" \
 meta target-role="Started"
primitive fs_webfs ocf:heartbeat:Filesystem \
 params device="/dev/drbd/by-res/r1" directory="/var/www/html" 
fstype="ext3" \
 op start interval="0" timeout="60" \
 op stop interval="0" timeout="120" \
 meta target-role="Started"
primitive ip1 ocf:heartbeat:IPaddr2 \
 params ip="72.9.147.196" nic="eth0" \
 op monitor interval="5s"
primitive ip1arp ocf:heartbeat:SendArp \
 params ip="72.9.147.196" nic="eth0"
primitive mysql ocf:heartbeat:mysql \
 params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" 
user="mysql" group="mysql" log="/var/log/mysqld.log" 
pid="/var/run/mysqld/mysqld.pid" datadir="/var/li
b/mysql" socket="/var/lib/mysql/mysql.sock" \
 op monitor interval="30s" timeout="30s" \
 op start interval="0" timeout="120" \
 op stop interval="0" timeout="120"
group WebServices ip1 ip1arp fs_webfs fs_mysql apache2 mysql \
 meta target-role="Started"
ms ms_drbd_mysql drbd_mysql \
 meta master-max="1" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true"
ms ms_drbd_webfs drbd_webfs \
 meta master-max="1" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true"
colocation apache2_with_ip inf: apache2 ip1
colocation apache2_with_mysql inf: apache2 ms_drbd_mysql:Master
colocation apache2_with_webfs inf: apache2 ms_drbd_webfs:Master
colocation fs_on_drbd inf: fs_mysql ms_drbd_mysql:Master
colocation ip_with_ip_arp inf: ip1 ip1arp
colocation mysqlfs_on_drbd inf: fs_mysql ms_drbd_mysql:Master
colocation webfs_on_drbd inf: fs_webfs ms_drbd_webfs:Master
order apache-after-webfs inf: fs_webfs:start apache2:start
order arp-after-ip inf: ip1:start ip1arp:start
order fs-mysql-after-drbd inf: ms_drbd_mysql:promote fs_mysql:start
order fs-webfs-after-drbd inf: ms_drbd_webfs:promote fs_webfs:start
order mysql-after-fs-mysql inf: fs_mysql:start mysql:start
property $id="cib-bootstrap-options" \
 dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \
 cluster-infrastructure="Heartbeat" \
 expected-quorum-votes="1" \
 stonith-enabled="false" \
 no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
 resource-stickiness="100"


On 5/17/2011 9:53 PM, Michael Schwartzkopff wrote:
>> If do do on ha2: crm node online ha2.iohost.com i

Re: [Linux-HA] Need HA Help - standby / online not switching automatically

2011-05-17 Thread Randy Katz
If do do on ha2: crm node online ha2.iohost.com it starts the VIP, it 
will ping, but does not
do the DRBD mounts and does not start web or mysql services. If I then 
issue the crm node online ha1.iohost.com
on ha1 it will make ha2 online with all services active! Then if I make 
ha2 standby ha1 will
become online with all services, just fine!

Any insights will be greatly appreciated, thanks!

Randy

On 5/17/2011 9:44 PM, Randy Katz wrote:
> In the logs, on ha2, I see at the time crm node standby ha1:
>
> May 18 10:32:54 ha2.iohost.com cib: [2378]: info: write_cib_contents:
> Archived previous version as /var/lib/heartbeat/crm/cib-25.raw
> May 18 10:32:54 ha2.iohost.com cib: [2378]: info: write_cib_contents:
> Wrote version 0.102.0 of the CIB to disk (digest:
> b445d9afde4b209981c3da08d4c24ecc)
> May 18 10:32:54 ha2.iohost.com cib: [2378]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.f5FXZH (digest:
> /var/lib/heartbeat/crm/cib.irSIZ7)
> May 18 10:32:54 ha2.iohost.com cib: [7779]: info: Managed
> write_cib_contents process 2378 exited with return code 0.
> May 18 10:33:11 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback:
> flush message from ha1.iohost.com
> May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback:
> flush message from ha1.iohost.com
> May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback:
> flush message from ha1.iohost.com
> May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback:
> flush message from ha1.iohost.com
> May 18 10:35:14 ha2.iohost.com cib: [7779]: info: cib_stats: Processed
> 48 operations (13125.00us average, 0% utilization) in the last 10min
>
> And on ha1:
>
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: - id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33">
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: - id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33-standby" />
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -
> May 17 22:33:02 ha1.iohost.com crmd: [8656]: info:
> abort_transition_graph: need_abort:59 - Triggered transition abort
> (complete=1) : Non-status change
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: -
> May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: need_abort: Aborting
> on change to admin_epoch
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +
> May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_state_transition:
> State transition S_IDLE ->  S_POLICY_ENGINE [ input=I_PE_CALC
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +
> May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_state_transition:
> All 2 cluster nodes are eligible to run resources.
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: + id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33">
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: + id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33-standby" />
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +
> May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_pe_invoke: Query
> 337: Requesting the current CIB: S_POLICY_ENGINE
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element:
> cib:diff: +
> May 17 22:33:02 ha1.iohost.com cib: [8652]: info: cib_process_request:
> Operation complete: op cib_modify for section nodes
> (origin=local/crm_attribute/4, version=0.102.1): ok (rc=0)
> May 17 22:33:02 h

Re: [Linux-HA] Need HA Help - standby / online not switching automatically

2011-05-17 Thread Randy Katz
hts: 
ms_drbd_mysql: Rolling back scores from fs_mysql
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: 
Resource drbd_mysql:0 cannot run anywhere
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: 
Resource drbd_mysql:1 cannot run anywhere
May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: 
ms_drbd_mysql: Rolling back scores from apache2



On 5/17/2011 9:28 PM, Randy Katz wrote:
> Hi,
>
> Relatively new to HA though I have been using Xen and reading
> this list here and there, now need some help:
>
> I have 2 nodes, physical, let's call node1/node2:
> In each I have VM's (Xen paravirt / ha1&  ha2). In each VM I have
> 2 LVs which are DRBD'd (r0 and r1, mysql data, and html data). There is a
> VIP between them, resolving the website, which is a simple
> Wordpress blog so to have database, and works well.
>
> When I start them (reboot VMs) they start up fine and ha1 is
> online (primary) and ha2 is standby (secondary). If I:
>
> 1. crm node standby ha1.iohost.com - sometimes ha2.iohost.com takes
> over, sometimes I am left with 2 nodes on standby, not
> sure why.
> 2. If 2 nodes are in standby and I issue: crm node online ha1.iohost.com
> sometimes ha2 will become active as it should when ha1
> went standby, sometimes ha1 will become active and sometimes, they will
> remain standby, not sure why.
>
> Question: How do I test and debug this? What parameters in which config
> file affect this behavior?
>
> Thank you in advance,
> Randy
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Need HA Help - standby / online not switching automatically

2011-05-17 Thread Randy Katz
Hi,

Relatively new to HA though I have been using Xen and reading
this list here and there, now need some help:

I have 2 nodes, physical, let's call node1/node2:
In each I have VM's (Xen paravirt / ha1 & ha2). In each VM I have
2 LVs which are DRBD'd (r0 and r1, mysql data, and html data). There is a
VIP between them, resolving the website, which is a simple
Wordpress blog so to have database, and works well.

When I start them (reboot VMs) they start up fine and ha1 is
online (primary) and ha2 is standby (secondary). If I:

1. crm node standby ha1.iohost.com - sometimes ha2.iohost.com takes 
over, sometimes I am left with 2 nodes on standby, not
sure why.
2. If 2 nodes are in standby and I issue: crm node online ha1.iohost.com 
sometimes ha2 will become active as it should when ha1
went standby, sometimes ha1 will become active and sometimes, they will 
remain standby, not sure why.

Question: How do I test and debug this? What parameters in which config 
file affect this behavior?

Thank you in advance,
Randy
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] drbd compile question

2011-03-30 Thread Randy Katz
Hi,

I loaded openSUSE 11.4 and installed corosync, pacemaker, drbd and xen 
and now it says the drbd tools
are not up to date. There is no other tool/package then drbd so I would 
like to just compile
drbd from source. The kernel is currently 2.6.37.1-1.2-xen. On the drbd 
site it says that would correspond
to drbd 8.3.9. Other then the prefix and config directories I notice 
there are other configuration options
for drbd, must they be set in order for it to work properly with 
corosync, etc...?

I am referring specifically to these options:

   --with-xen  Enable Xen integration
   --with-pacemakerEnable Pacemaker integration
   --with-heartbeatEnable Heartbeat integration

Thank you in advance,
Randy
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Sort of crm commandes but off line ?

2011-03-24 Thread Randy Katz
This might sound obvious but is an ssh call acceptable?

On 3/23/2011 8:38 AM, Alain.Moulle wrote:
> Hi,
>
> I'm looking for a command which will give to me information of the HA
> cluster ,
> such as for example all nodes hostnames which are in the same HA cluster BUT
> from a node where Pacemaker is not active.
>
> For example: I have a cluster with node1 , node2, node3
> Pacemaker is running on node2&  node3
> Pacemaker is not running on node1 , so any crm command returns
> "Signon to CIB failed: connection failed
> Init failed, could not perform requested operations"
> I'm on node1 : I want to know (by script) if Pacemaker is active
> on at least another node in the HA cluster including the node
> where I am (so node1)
>
> Is there a command which could give me such information "offline" ,
> or do I have to scan the uname fields in the record  
> and to ssh on other nodes  to get information ...
>
> Thanks
> Alain
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] question on Creating an Active/Passive iSCSI configuration

2011-03-22 Thread Randy Katz
This was posted quite a while ago, anyone care to look into it and
answer?

Thank you in advance,
Randy

On 3/14/2011 6:18 AM, Dejan Muhamedagic wrote:
> On Fri, Mar 11, 2011 at 10:23:37AM -0800, Randy Katz wrote:
>> On 3/11/2011 3:29 AM, Dejan Muhamedagic wrote:
>>> Hi,
>>>
>>> On Fri, Mar 11, 2011 at 01:36:25AM -0800, Randy Katz wrote:
>>>> On 3/11/2011 12:50 AM, RaSca wrote:
>>>>> Il giorno Ven 11 Mar 2011 07:32:32 CET, Randy Katz ha scritto:
>>>>>> ps - in /var/log/messages I find this:
>>>>>>
>>>>>> Mar 10 22:31:45 drbd1 lrmd: [3274]: ERROR: get_resource_meta: pclose
>>>>>> failed: Interrupted system call
>>>>>> Mar 10 22:31:45 drbd1 lrmd: [3274]: WARN: on_msg_get_metadata: empty
>>>>>> metadata for ocf::linbit::drbd.
>>>>>> Mar 10 22:31:45 drbd1 lrmadmin: [3481]: ERROR:
>>>>>> lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply
>>>>>> message of rmetadata with function get_ret_from_msg.
>>>>> [...]
>>>>>
>>>>> Hi,
>>>>> I think that the message "no such resource agent" is explaining what's
>>>>> the matter.
>>>>> Does the file /usr/lib/ocf/resource.d/linbit/drbd exists? Is the drbd
>>>>> file executable? Have you correctly installed the drbd packages?
>>>>>
>>>>> Check those things, you can try to reinstall drbd.
>>>>>
>>>> Hi
>>>>
>>>> # ls -l /usr/lib/ocf/resource.d/linbit/drbd
>>>> -rwxr-xr-x 1 root root 24523 Jun  4  2010
>>>> /usr/lib/ocf/resource.d/linbit/drbd
>>> Which cluster-glue version do you run?
>>> Try also:
>>>
>>> # lrmadmin -C
>>> # lrmadmin -P ocf drbd
>>> # export OCF_ROOT=/usr/lib/ocf
>>> # /usr/lib/ocf/resource.d/linbit/drbd meta-data
>> I am running from a source build/install as per clusterlabs.org as the
>> rpm's had broken dependencies and
>> would not install. I have now blown away that CentOS (one of them)
>> machine and installed openSUSE as they
>> said everything was included but it seems on 11.3 not on 11.4, on 11.4
>> the install is broken and so now
> I guess that openSUSE would like to hear about it too, just in
> which way it is broken.
>
I did an openSUSE 11.4 install from DVD. I then used zypper to install 
pacemaker heartbeat corosync libpacemaker3. I ended up
with a clusterlabs.repo and older versions and had to break dependency 
for pacemaker or it would not install. I found out later there
are later versions, precompiled in the openSUSE repository, you just 
need to call the specific versions and they will install, I had to
remove the previous as some new dependencies were created. The versions 
I ended up with are:

Name: pacemaker
Version: 1.1.5-3.2
Arch: x86_64
Vendor: openSUSE

Name: libpacemaker3
Version: 1.1.5-3.2
Arch: x86_64
Vendor: openSUSE

Name: heartbeat
Version: 3.0.4-25.28.1
Arch: x86_64
Vendor: openSUSE

Name: corosync
Version: 1.3.0-3.1
Arch: x86_64
Vendor: openSUSE

At this point I was not sure whether to install iet or tgt, I saw some 
examples with tgt so I installed that. So far it looks like I have CRM 
and have mocked up the example from ha-iscsi.pdf (trying to mitigate 
some of the errors, there are errors!). I noticed the floating ip 
addresses do not ping so I added a new set and they ping though the 
original ones do not, perhaps something else in the config is 
prohibiting that. Here are my current crm config commands:

property stonith-enabled="false"
property no-quorum-policy="ignore"
property default-resource-stickiness="200"
primitive res_drbd_iscsivg01 ocf:linbit:drbd params 
drbd_resource="iscsivg01" op monitor interval="10s"
ms ms_drbd_iscsivg01 res_drbd_iscsivg01 meta master-max="1" 
master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
primitive res_drbd_iscsivg02 ocf:linbit:drbd params 
drbd_resource="iscsivg02" op monitor interval="10s"
ms ms_drbd_iscsivg02 res_drbd_iscsivg02 meta clone-max="2" notify="true"
primitive res_ip_alicebob01 ocf:heartbeat:IPaddr2 params 
ip="192.168.1.218" cidr_netmask="24" op monitor interval="10s"
primitive res_ip_alicebob02 ocf:heartbeat:IPaddr2 params 
ip="192.168.1.219" cidr_netmask="24" op monitor interval="10s"
primitive res_ip_c1c201 ocf:heartbeat:IPaddr2 params ip="192.168.1.220" 
cidr_netmask="24" op monitor interval="10s"
primitive res_ip_c1c202 ocf:heartbeat:IPaddr2 p

Re: [Linux-HA] question on Creating an Active/Passive iSCSI configuration

2011-03-15 Thread Randy Katz
On 3/14/2011 6:18 AM, Dejan Muhamedagic wrote:
> On Fri, Mar 11, 2011 at 10:23:37AM -0800, Randy Katz wrote:
>> On 3/11/2011 3:29 AM, Dejan Muhamedagic wrote:
>>> Hi,
>>>
>>> On Fri, Mar 11, 2011 at 01:36:25AM -0800, Randy Katz wrote:
>>>> On 3/11/2011 12:50 AM, RaSca wrote:
>>>>> Il giorno Ven 11 Mar 2011 07:32:32 CET, Randy Katz ha scritto:
>>>>>> ps - in /var/log/messages I find this:
>>>>>>
>>>>>> Mar 10 22:31:45 drbd1 lrmd: [3274]: ERROR: get_resource_meta: pclose
>>>>>> failed: Interrupted system call
>>>>>> Mar 10 22:31:45 drbd1 lrmd: [3274]: WARN: on_msg_get_metadata: empty
>>>>>> metadata for ocf::linbit::drbd.
>>>>>> Mar 10 22:31:45 drbd1 lrmadmin: [3481]: ERROR:
>>>>>> lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply
>>>>>> message of rmetadata with function get_ret_from_msg.
>>>>> [...]
>>>>>
>>>>> Hi,
>>>>> I think that the message "no such resource agent" is explaining what's
>>>>> the matter.
>>>>> Does the file /usr/lib/ocf/resource.d/linbit/drbd exists? Is the drbd
>>>>> file executable? Have you correctly installed the drbd packages?
>>>>>
>>>>> Check those things, you can try to reinstall drbd.
>>>>>
>>>> Hi
>>>>
>>>> # ls -l /usr/lib/ocf/resource.d/linbit/drbd
>>>> -rwxr-xr-x 1 root root 24523 Jun  4  2010
>>>> /usr/lib/ocf/resource.d/linbit/drbd
>>> Which cluster-glue version do you run?
>>> Try also:
>>>
>>> # lrmadmin -C
>>> # lrmadmin -P ocf drbd
>>> # export OCF_ROOT=/usr/lib/ocf
>>> # /usr/lib/ocf/resource.d/linbit/drbd meta-data
>> I am running from a source build/install as per clusterlabs.org as the
>> rpm's had broken dependencies and
>> would not install. I have now blown away that CentOS (one of them)
>> machine and installed openSUSE as they
>> said everything was included but it seems on 11.3 not on 11.4, on 11.4
>> the install is broken and so now
> I guess that openSUSE would like to hear about it too, just in
> which way it is broken.
>
I did an openSUSE 11.4 install from DVD. I then used zypper to install 
pacemaker heartbeat corosync libpacemaker3. I ended up
with a clusterlabs.repo and older versions and had to break dependency 
for pacemaker or it would not install. I found out later there
are later versions, precompiled in the openSUSE repository, you just 
need to call the specific versions and they will install, I had to
remove the previous as some new dependencies were created. The versions 
I ended up with are:

Name: pacemaker
Version: 1.1.5-3.2
Arch: x86_64
Vendor: openSUSE

Name: libpacemaker3
Version: 1.1.5-3.2
Arch: x86_64
Vendor: openSUSE

Name: heartbeat
Version: 3.0.4-25.28.1
Arch: x86_64
Vendor: openSUSE

Name: corosync
Version: 1.3.0-3.1
Arch: x86_64
Vendor: openSUSE

At this point I was not sure whether to install iet or tgt, I saw some 
examples with tgt so I installed that. So far it looks like I have CRM 
and have mocked up the example from ha-iscsi.pdf (trying to mitigate 
some of the errors, there are errors!). I noticed the floating ip 
addresses do not ping so I added a new set and they ping though the 
original ones do not, perhaps something else in the config is 
prohibiting that. Here are my current crm config commands:

property stonith-enabled="false"
property no-quorum-policy="ignore"
property default-resource-stickiness="200"
primitive res_drbd_iscsivg01 ocf:linbit:drbd params 
drbd_resource="iscsivg01" op monitor interval="10s"
ms ms_drbd_iscsivg01 res_drbd_iscsivg01 meta master-max="1" 
master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
primitive res_drbd_iscsivg02 ocf:linbit:drbd params 
drbd_resource="iscsivg02" op monitor interval="10s"
ms ms_drbd_iscsivg02 res_drbd_iscsivg02 meta clone-max="2" notify="true"
primitive res_ip_alicebob01 ocf:heartbeat:IPaddr2 params 
ip="192.168.1.218" cidr_netmask="24" op monitor interval="10s"
primitive res_ip_alicebob02 ocf:heartbeat:IPaddr2 params 
ip="192.168.1.219" cidr_netmask="24" op monitor interval="10s"
primitive res_ip_c1c201 ocf:heartbeat:IPaddr2 params ip="192.168.1.220" 
cidr_netmask="24" op monitor interval="10s"
primitive res_ip_c1c202 ocf:heartbeat:IPaddr2 params ip="192.168.1.221" 
cidr_netmask="24" op monitor interval=&

Re: [Linux-HA] question on Creating an Active/Passive iSCSI configuration

2011-03-11 Thread Randy Katz
On 3/11/2011 3:29 AM, Dejan Muhamedagic wrote:
> Hi,
>
> On Fri, Mar 11, 2011 at 01:36:25AM -0800, Randy Katz wrote:
>> On 3/11/2011 12:50 AM, RaSca wrote:
>>> Il giorno Ven 11 Mar 2011 07:32:32 CET, Randy Katz ha scritto:
>>>> ps - in /var/log/messages I find this:
>>>>
>>>> Mar 10 22:31:45 drbd1 lrmd: [3274]: ERROR: get_resource_meta: pclose
>>>> failed: Interrupted system call
>>>> Mar 10 22:31:45 drbd1 lrmd: [3274]: WARN: on_msg_get_metadata: empty
>>>> metadata for ocf::linbit::drbd.
>>>> Mar 10 22:31:45 drbd1 lrmadmin: [3481]: ERROR:
>>>> lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply
>>>> message of rmetadata with function get_ret_from_msg.
>>> [...]
>>>
>>> Hi,
>>> I think that the message "no such resource agent" is explaining what's
>>> the matter.
>>> Does the file /usr/lib/ocf/resource.d/linbit/drbd exists? Is the drbd
>>> file executable? Have you correctly installed the drbd packages?
>>>
>>> Check those things, you can try to reinstall drbd.
>>>
>> Hi
>>
>> # ls -l /usr/lib/ocf/resource.d/linbit/drbd
>> -rwxr-xr-x 1 root root 24523 Jun  4  2010
>> /usr/lib/ocf/resource.d/linbit/drbd
> Which cluster-glue version do you run?
> Try also:
>
> # lrmadmin -C
> # lrmadmin -P ocf drbd
> # export OCF_ROOT=/usr/lib/ocf
> # /usr/lib/ocf/resource.d/linbit/drbd meta-data
I am running from a source build/install as per clusterlabs.org as the 
rpm's had broken dependencies and
would not install. I have now blown away that CentOS (one of them) 
machine and installed openSUSE as they
said everything was included but it seems on 11.3 not on 11.4, on 11.4 
the install is broken and so now
running some later later versions and running into some other issues, 
will report back with findings. What
os distro is the least of the problems to get this stuff running on? I 
just want to get it running, run a few tests,
and then figure out where to go from there.

Thanks,
Randy
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] question on Creating an Active/Passive iSCSI configuration

2011-03-11 Thread Randy Katz
On 3/11/2011 12:50 AM, RaSca wrote:
> Il giorno Ven 11 Mar 2011 07:32:32 CET, Randy Katz ha scritto:
>> ps - in /var/log/messages I find this:
>>
>> Mar 10 22:31:45 drbd1 lrmd: [3274]: ERROR: get_resource_meta: pclose
>> failed: Interrupted system call
>> Mar 10 22:31:45 drbd1 lrmd: [3274]: WARN: on_msg_get_metadata: empty
>> metadata for ocf::linbit::drbd.
>> Mar 10 22:31:45 drbd1 lrmadmin: [3481]: ERROR:
>> lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply
>> message of rmetadata with function get_ret_from_msg.
> [...]
>
> Hi,
> I think that the message "no such resource agent" is explaining what's 
> the matter.
> Does the file /usr/lib/ocf/resource.d/linbit/drbd exists? Is the drbd 
> file executable? Have you correctly installed the drbd packages?
>
> Check those things, you can try to reinstall drbd.
>
Hi

# ls -l /usr/lib/ocf/resource.d/linbit/drbd
-rwxr-xr-x 1 root root 24523 Jun  4  2010 
/usr/lib/ocf/resource.d/linbit/drbd

DRBD is running fine, I have setup that part of it already. I am using 
the ha-scsi.pdf and up to this
point everything is fine.

Randy
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] question on Creating an Active/Passive iSCSI configuration

2011-03-10 Thread Randy Katz
ps - in /var/log/messages I find this:

Mar 10 22:31:45 drbd1 lrmd: [3274]: ERROR: get_resource_meta: pclose 
failed: Interrupted system call
Mar 10 22:31:45 drbd1 lrmd: [3274]: WARN: on_msg_get_metadata: empty 
metadata for ocf::linbit::drbd.
Mar 10 22:31:45 drbd1 lrmadmin: [3481]: ERROR: 
lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply 
message of rmetadata with function get_ret_from_msg.

On 3/10/2011 10:29 PM, Randy Katz wrote:
> Hi,
>
> I hope this is the correct forum for this, it is crm:
>
> The initial configuration commands worked or so it seems (not sure how
> to check).
>
> crm(live)# configure
> crm(live)configure# property stonith-enabled="false"
> crm(live)configure# property no-quorum-policy="ignore"
> crm(live)configure# property default-resource-stickiness="200"
> crm(live)configure# commit
>
> Then the following, the first command gives an error:
>
> # crm
> crm(live)# configure
> crm(live)configure# primitive res_drbd_iscsivg01 \
>   >  ocf:linbit:drbd \
>   >  params drbd_resource="iscsivg01" \
>   >  op monitor interval="10"
> lrmadmin[3437]: 2011/03/10_22:29:03 ERROR:
> lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply
> message of rmetadata with function get_ret_from_msg.
> ERROR: ocf:linbit:drbd: could not parse meta-data:
> ERROR: ocf:linbit:drbd: no such resource agent
>
> I am a bit lost not sure what to check from here.
> Thank you in advance,
> Randy
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] question on Creating an Active/Passive iSCSI configuration

2011-03-10 Thread Randy Katz
Hi,

I hope this is the correct forum for this, it is crm:

The initial configuration commands worked or so it seems (not sure how 
to check).

crm(live)# configure
crm(live)configure# property stonith-enabled="false"
crm(live)configure# property no-quorum-policy="ignore"
crm(live)configure# property default-resource-stickiness="200"
crm(live)configure# commit

Then the following, the first command gives an error:

# crm
crm(live)# configure
crm(live)configure# primitive res_drbd_iscsivg01 \
 > ocf:linbit:drbd \
 > params drbd_resource="iscsivg01" \
 > op monitor interval="10"
lrmadmin[3437]: 2011/03/10_22:29:03 ERROR: 
lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply 
message of rmetadata with function get_ret_from_msg.
ERROR: ocf:linbit:drbd: could not parse meta-data:
ERROR: ocf:linbit:drbd: no such resource agent

I am a bit lost not sure what to check from here.
Thank you in advance,
Randy
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems