[Linux-ha-dev] possible deadlock in lrmd?
Hi, Senko Rasic proposed a patch for the client unregister which would prevent a double unref of glib sources (IPC channel). However, I cannot recall any deadlocks in lrmd. See http://www.init.hr/dev/cluster/patches/2444.diff Senko, is this something you observed or you just thought it might occur? BTW, this same or very similar procedure is used by all programs including pacemaker. Thanks, Dejan P.S. Senko, if you want to post to this mailing list, you'll have to subscribe first. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] possible deadlock in lrmd?
On Thu, Oct 21, 2010 at 11:57:56AM +0200, Senko Rasic wrote: Hi, I'm replying directly to you with this, feel free to forward to the list. For further discussions I could also join the list. Yes, it would be good. On 10/21/2010 11:34 AM, Dejan Muhamedagic wrote: Senko Rasic proposed a patch for the client unregister which would prevent a double unref of glib sources (IPC channel). However, I cannot recall any deadlocks in lrmd. See http://www.init.hr/dev/cluster/patches/2444.diff Senko, is this something you observed or you just thought it might occur? This is what I've observed happening whenever I had g_type_init() invoked either in the lrmd itself, or in the plugin. g_type_init() initialises the glib type system, which is needed for the dbus glib proxies to work. So, when g_type_init() was called earlier, the deadlock would eventually happen in the part of the code I patched. It should be very easy to test; just try using lrmd with g_type_init() and without, and it shoudl be apparent it gets blocked after the first request, no matter which plugin serves the req. Perhaps it would help to do g_type_init() on lrmd startup before doing any glib2 related stuff (in particular the IPC). Not sure, but it could be a problem with locking, i.e. that the deadlock you saw was actually lrmd waiting on some mutex. lrmd is not a threaded application and therefore has no g_thread_init(). Apparently, g_thread_init() is needed (or recommended) before g_type_init(). There's also a recent patch which enables threading within g_type_init(): http://osdir.com/ml/svn-commits-list/2010-01/msg03809.html Could you give it a try and move g_type_init() to init_start() before any other glib calls. Perhaps with g_thread_init() too. Though I'm not sure if this is going to work at all. Cheers, Dejan BTW, this same or very similar procedure is used by all programs including pacemaker. Right, the code I commented out didn't look obviusly wrong to me (so I added a comment about the potential ref leak). Unfortunately I don't know much about lrmd internals and haven't been able to reason out why this would be happening. BR, Senko -- Senko Rasic, DobarKod d.o.o. senko.ra...@dobarkod.hr http://dobarkod.hr/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH 0 of 5] Several fixes to the ManageVE RA
Hello everyone, I've implemented migration for the ManageVE RA (via OpenVZ's checkpoint/restore mechanism), and in the process cleaned up a few little things. Please review. If I don't hear of anyone reporting show stoppers in this patchset, I intend to push it by mid next week. Thanks in advance for everyone's feedback! Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH 2 of 5] Medium: ManageVE: add migration capability
# HG changeset patch # User Florian Haas florian.h...@linbit.com # Date 1287380876 -7200 # Node ID 8c88afdf27b0de507ce409af74312f5cd92eed66 # Parent f238c087b79730a64f576b62c10e7cbcb638d000 Medium: ManageVE: add migration capability OpenVZ does not support live migration, but it supports checkpoint/restore. Which is close enough. diff -r f238c087b797 -r 8c88afdf27b0 heartbeat/ManageVE --- a/heartbeat/ManageVETue Oct 19 19:03:57 2010 +0200 +++ b/heartbeat/ManageVEMon Oct 18 07:47:56 2010 +0200 @@ -91,6 +91,8 @@ action name=stop timeout=75 / action name=status depth=0 timeout=10 interval=10 / action name=monitor depth=0 timeout=10 interval=10 / +action name=migrate_to timeout=75 / +action name=migrate_from timeout=75 / action name=validate-all timeout=5 / action name=meta-data timeout=5 / /actions @@ -162,6 +164,22 @@ return $OCF_SUCCESS } +migrate_to_ve() +{ + if ! status_ve; then +ocf_log err VE $VEID is not running, aborting +exit $OCF_ERR_GENERIC + fi + ocf_run $VZCTL chkpnt $VEID || exit $OCF_ERR_GENERIC + return $OCF_SUCCESS +} + +migrate_from_ve() +{ + ocf_run $VZCTL restore $VEID || exit $OCF_ERR_GENERIC + return $OCF_SUCCESS +} + # # status_ve() # @@ -282,6 +300,12 @@ status|monitor) status_ve ;; + migrate_to) +migrate_to_ve +;; + migrate_from) +migrate_from_ve +;; validate-all) validate_all_ve ;; ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH 1 of 5] Low: ManageVE: exit, don't return, on error
# HG changeset patch # User Florian Haas florian.h...@linbit.com # Date 1287507837 -7200 # Node ID f238c087b79730a64f576b62c10e7cbcb638d000 # Parent 50a340003ffe0310c08425b73c141e96c42b468f Low: ManageVE: exit, don't return, on error diff -r 50a340003ffe -r f238c087b797 heartbeat/ManageVE --- a/heartbeat/ManageVEFri Oct 15 16:58:07 2010 +0200 +++ b/heartbeat/ManageVETue Oct 19 19:03:57 2010 +0200 @@ -115,7 +115,7 @@ veexists=`$VZCTL status $VEID 2/dev/null | $AWK '{print $3}'` if [[ $veexists != exist ]]; then ocf_log err vzctl status $VEID returned: $VEID does not exist. -return $OCF_ERR_INSTALLED +exit $OCF_ERR_INSTALLED fi status_ve @@ -132,7 +132,7 @@ if [[ $retcode != 0 $retcode != 32 ]]; then ocf_log err vzctl start $VEID returned: $retcode -return $OCF_ERR_GENERIC +exit $OCF_ERR_GENERIC fi return $OCF_SUCCESS @@ -156,7 +156,7 @@ if [[ $retcode != 0 ]]; then ocf_log err vzctl stop $VEID returned: $retcode -return $OCF_ERR_GENERIC +exit $OCF_ERR_GENERIC fi return $OCF_SUCCESS @@ -180,7 +180,7 @@ if [[ $retcode != 0 ]]; then ocf_log err vzctl status $VEID returned: $retcode -return $OCF_ERR_GENERIC +exit $OCF_ERR_GENERIC fi if [[ $veexists != exist ]]; then @@ -197,7 +197,7 @@ ;; *) ocf_log err vzctl status $VEID, wrong output format. (5th column: $vestatus) - return $OCF_ERR_GENERIC + exit $OCF_ERR_GENERIC ;; esac } ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH 3 of 5] Low: ManageVE: clean up start action
# HG changeset patch # User Florian Haas florian.h...@linbit.com # Date 1287508227 -7200 # Node ID 3f4c98a0c37915877baee2d970b8051f0a4cef7b # Parent 8c88afdf27b0de507ce409af74312f5cd92eed66 Low: ManageVE: clean up start action Invoke status before start, this removes the need for checking for the 32 exit code (VE already running) on start. Use ocf_run. diff -r 8c88afdf27b0 -r 3f4c98a0c379 heartbeat/ManageVE --- a/heartbeat/ManageVEMon Oct 18 07:47:56 2010 +0200 +++ b/heartbeat/ManageVETue Oct 19 19:10:27 2010 +0200 @@ -103,39 +103,16 @@ # # start_ve() # -# ATTENTION: The following code relies on vzctl's exit codes, especially: -# -# 0 : success -# 32 : VE already running -# -# In case any of those exit codes change, this function will need fixing. +# Starts a VE, or simply logs a message if the VE is already running. # start_ve() { - declare -i retcode - - veexists=`$VZCTL status $VEID 2/dev/null | $AWK '{print $3}'` - if [[ $veexists != exist ]]; then -ocf_log err vzctl status $VEID returned: $VEID does not exist. -exit $OCF_ERR_INSTALLED + if status_ve; then + ocf_log info VE $VEID already running. + return $OCF_SUCCESS fi - status_ve - retcode=$? - - if [[ $retcode == $OCF_SUCCESS ]]; then -return $OCF_SUCCESS - elif [[ $retcode != $OCF_NOT_RUNNING ]]; then -return $retcode - fi - - $VZCTL start $VEID /dev/null - retcode=$? - - if [[ $retcode != 0 $retcode != 32 ]]; then -ocf_log err vzctl start $VEID returned: $retcode -exit $OCF_ERR_GENERIC - fi + ocf_run $VZCTL start $VEID || exit $OCF_ERR_GENERIC return $OCF_SUCCESS } ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH 4 of 5] Low: ManageVE: clean up stop action
# HG changeset patch # User Florian Haas florian.h...@linbit.com # Date 1287508607 -7200 # Node ID d7d0f2719155a521c23a1e0308cfffe2ad67c09b # Parent 3f4c98a0c37915877baee2d970b8051f0a4cef7b Low: ManageVE: clean up stop action Invoke status before stop. Use ocf_run. diff -r 3f4c98a0c379 -r d7d0f2719155 heartbeat/ManageVE --- a/heartbeat/ManageVETue Oct 19 19:10:27 2010 +0200 +++ b/heartbeat/ManageVETue Oct 19 19:16:47 2010 +0200 @@ -128,15 +128,13 @@ # stop_ve() { - declare -i retcode + status_ve + if [ $? -eq $OCF_NOT_RUNNING ]; then + ocf_log info VE $VEID already stopped. + return $OCF_SUCCESS + fi - $VZCTL stop $VEID /dev/null - retcode=$? - - if [[ $retcode != 0 ]]; then -ocf_log err vzctl stop $VEID returned: $retcode -exit $OCF_ERR_GENERIC - fi + ocf_run $VZCTL stop $VEID || exit $OCF_ERR_GENERIC return $OCF_SUCCESS } ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-HA] Searching for Heartbeat 2.1.4 GUI for CentOS 4.4 x86_64
On Wed, October 20, 2010 11:21 pm, Tony Hunter wrote: Normally it should be installed by DRBD MC, whatever OS you are using. It should be called /usr/local/bin/drbd-gui-helper-0.8.2 in this case. Are you saying it's not there? If you copy it there and rename it, it should be enough. I promise I test it on Windows next time. :) How do you mean? If I downloaded DMC-0.8.2.jar and executed it on a WIN client, how would I know about it should be installed by DRBD MC, whatever OS you are using. I don't see anything at http://oss.linbit.com/drbd-mc/ or http://www.drbd.org/mc/management-console/ that gives cluster host requirements for DMC-0.8.2.jar. I mean that DRBD MC installs it after it connects to the hosts without any human intervention. If it doesn't you've found a bug. The DMC jar file contains drbd-gui-helper and if you'd unzip it, you'd see it there. Rasto -- : Dipl-Ing Rastislav Levrinc : DRBD-MC http://www.drbd.org/mc/management-console/ : DRBD/HA support and consulting http://www.linbit.com/ DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Searching for Heartbeat 2.1.4 GUI for CentOS 4.4 x86_64
On Wed, October 20, 2010 8:10 pm, Tony Hunter wrote: What are the requirements for the unix hosts, if I want to run DRBD MC on 'the other' OS. :). I tested firing up DMC-0.8.2.jar on a WIN box, and found I had to download drbd-mc-0.8.2.tar.gz and copy drbd-gui-helper to /usr/local/bin/ on the unix hosts. I've just tried it on Windows XP and it installs/creates the drbd-gui-helper-0.8.2 just fine, so it works for me(tm). What is the error message that showed you, that drbd-gui-helper isn't there? Are you connecting as a user other than root without sudo? That would for example explain it. You can also run java -jar DMC-0.8.2.jar from command line and see what if it prints something interesting. Rasto -- : Dipl-Ing Rastislav Levrinc : DRBD-MC http://www.drbd.org/mc/management-console/ : DRBD/HA support and consulting http://www.linbit.com/ DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] superfluous dependency in heartbeat spec file
Hi Vadym, can I apply this also on SLES10/SLES11 ? TIA Nikita Michalko Am Dienstag, 12. Oktober 2010 14:29 schrieb Vadym Chepkov: It was brought up in pacemaker mail list but this applies to heartbeat rpm packaging as well. Libraries do not depend on base package, they are independent. This is how one can install several version of the same library (compat- packages) Also it is possible to use heartbeat libraries without using heartbeat daemon itself. (if one uses pacemaker with corosync, for instance). Vadym # HG changeset patch # User Vadym Chepkov vchep...@gmail.com # Date 1286886305 14400 # Node ID f1aea427d2c01756e06b4b917787c21ee440f24b # Parent 82fc843fbcf9733e50bbc169c95e51b6c7f97c54 Fix package inter-dependencies diff -r 82fc843fbcf9 -r f1aea427d2c0 heartbeat-fedora.spec --- a/heartbeat-fedora.spec Mon Oct 04 22:12:37 2010 +0200 +++ b/heartbeat-fedora.spec Tue Oct 12 08:25:05 2010 -0400 @@ -40,6 +40,7 @@ BuildRequires:which BuildRequires:cluster-glue-libs-devel BuildRequires:libxslt docbook-dtds docbook-style-xsl +Requires: heartbeat-libs = %{version}-%{release} Requires: PyXML Requires: resource-agents Requires: cluster-glue-libs @@ -81,7 +82,6 @@ %package libs Summary: Heartbeat libraries Group:System Environment/Daemons -Requires: heartbeat = %{version}-%{release} %description libs Heartbeat library package @@ -89,7 +89,7 @@ %package devel Summary:Heartbeat development package Group: System Environment/Daemons -Requires: heartbeat = %{version}-%{release} +Requires: heartbeat-libs = %{version}-%{release} %description devel Headers and shared libraries for writing programs for Heartbeat ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Redundant Rings Still Not There?
note that redundant rings are still not there to paraphrase the upstream. Dejan, your previous comment about redundant rings is worthy of a whole new thread. I hope I am not barking up the wrong tree, but my plan was to create a 3 node cluster with 2 resources, where resource R1 is allowed to run on NODE1 and NODE3 (usually on NODE1), and resource R2 is allowed to run on NODE2 and NODE3 (usually on NODE2). Thus NODE1 and NODE2 cannot run each other's resources and NODE3 operates as a failover for both of them. The drawing I sent in the message entitled My Second Pacemaker Cluster was designed to accomplish this. Is what I'm trying to do impossible? I thought it was totally in the spirit of Pacemaker clustering. -- Eric Robinson Disclaimer - October 21, 2010 This email and any files transmitted with it are confidential and intended solely for General Linux-HA mailing list. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physicians' Managed Care or Physician Select Management. Warning: Although Physicians' Managed Care or Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments. This disclaimer was added by Policy Patrol: http://www.policypatrol.com/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Redundant Rings Still Not There?
The way resources move around has nothing to do with how you setup corosync rings. For each ring, all nodes must be accessible over the interface specified in the interface section. How else can one form a ring? ;-) I think the confusion is entering the picture because (1) I'm using back-to-back Ethernet connections for DRBD replication. Those are point-to-point links from NODE1 to NODE3 and NODE2 to NODE3, but there is no link necessary between NODE1 and NODE2 because there is no DRBD replication between them. But (2) I am also using those links for corosync communication because they are more reliable than using the bonded interfaces through the switched network (although I am using those too). So I guess what I'm trying to accomplish is to have three separate corosync rings: -- Ring 1 through the switched network that includes all three nodes, where pacemaker is configured with resource constraints to keep R1 and R2 on their assigned node pairs. -- Ring 2 that includes NODE1 and NODE3 (logically a two-node ring, though technically just back-to-back) -- Ring 3 that includes NODE2 and NODE3 (logically a two-node ring, though technically just back-to-back) Does that make sense? -- Eric Robinson Disclaimer - October 21, 2010 This email and any files transmitted with it are confidential and intended solely for General Linux-HA mailing list. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physicians' Managed Care or Physician Select Management. Warning: Although Physicians' Managed Care or Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments. This disclaimer was added by Policy Patrol: http://www.policypatrol.com/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] How to use STONITH plugins external/vmware?
Dear All, I am new here; I want to know did some body know how to configure the STONITH plugins external/vmware? If you know it, can you share your experience? Thanks. Best wishes, Dika.Ye ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] superfluous dependency in heartbeat spec file
On Oct 21, 2010, at 5:09 AM, Nikita Michalko wrote: Hi Vadym, can I apply this also on SLES10/SLES11 ? I don't see reason why not, they are rpm based, as far as I know. Vadym TIA Nikita Michalko Am Dienstag, 12. Oktober 2010 14:29 schrieb Vadym Chepkov: It was brought up in pacemaker mail list but this applies to heartbeat rpm packaging as well. Libraries do not depend on base package, they are independent. This is how one can install several version of the same library (compat- packages) Also it is possible to use heartbeat libraries without using heartbeat daemon itself. (if one uses pacemaker with corosync, for instance). Vadym # HG changeset patch # User Vadym Chepkov vchep...@gmail.com # Date 1286886305 14400 # Node ID f1aea427d2c01756e06b4b917787c21ee440f24b # Parent 82fc843fbcf9733e50bbc169c95e51b6c7f97c54 Fix package inter-dependencies diff -r 82fc843fbcf9 -r f1aea427d2c0 heartbeat-fedora.spec --- a/heartbeat-fedora.spec Mon Oct 04 22:12:37 2010 +0200 +++ b/heartbeat-fedora.spec Tue Oct 12 08:25:05 2010 -0400 @@ -40,6 +40,7 @@ BuildRequires:which BuildRequires:cluster-glue-libs-devel BuildRequires:libxslt docbook-dtds docbook-style-xsl +Requires: heartbeat-libs = %{version}-%{release} Requires: PyXML Requires: resource-agents Requires: cluster-glue-libs @@ -81,7 +82,6 @@ %package libs Summary: Heartbeat libraries Group:System Environment/Daemons -Requires: heartbeat = %{version}-%{release} %description libs Heartbeat library package @@ -89,7 +89,7 @@ %package devel Summary:Heartbeat development package Group: System Environment/Daemons -Requires: heartbeat = %{version}-%{release} +Requires: heartbeat-libs = %{version}-%{release} %description devel Headers and shared libraries for writing programs for Heartbeat ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems