On 7 Feb 2014, at 10:22 pm, Asgaroth <li...@blueface.com> wrote: > On 06/02/2014 05:52, Vladislav Bogdanov wrote: >> Hi, >> >> I bet your problem comes from the LSB clvmd init script. >> Here is what it does do: >> >> =========== >> ... >> clustered_vgs() { >> ${lvm_vgdisplay} 2>/dev/null | \ >> awk 'BEGIN {RS="VG Name"} {if (/Clustered/) print $1;}' >> } >> >> clustered_active_lvs() { >> for i in $(clustered_vgs); do >> ${lvm_lvdisplay} $i 2>/dev/null | \ >> awk 'BEGIN {RS="LV Name"} {if (/[^N^O^T] available/) print $1;}' >> done >> } >> >> rh_status() { >> status $DAEMON >> } >> ... >> case "$1" in >> ... >> status) >> rh_status >> rtrn=$? >> if [ $rtrn = 0 ]; then >> cvgs="$(clustered_vgs)" >> echo Clustered Volume Groups: ${cvgs:-"(none)"} >> clvs="$(clustered_active_lvs)" >> echo Active clustered Logical Volumes: ${clvs:-"(none)"} >> fi >> ... >> esac >> >> exit $rtrn >> ========= >> >> So, it not only looks for status of daemon itself, but also tries to >> list volume groups. And this operation is blocked because fencing is >> still in progress, and the whole cLVM thing (as well as DLM itself and >> all other dependent services) is frozen. So your resource timeouts in >> monitor operation, and then pacemaker asks it to stop (unless you have >> on-fail=fence). Anyways, there is a big chance that stop will fail too, >> and that leads again to fencing. cLVM is very fragile in my opinion >> (although newer versions running on corosync2 stack seem to be much >> better). And it is probably still doesn't work well when managed by >> pacemaker in CMAN-based clusters, because it blocks globally if any node >> in the whole cluster is online at the cman layer but doesn't run clvmd >> (I checked last time with .99). And that was the same for all stacks, >> until was fixed for corosync (only 2?) stack recently. The problem with >> that is that you cannot just stop pacemaker on one node (f.e. for >> maintenance), you should immediately stop cman as well (or run clvmd in >> cman'ish way) - cLVM freezes on another node. This should be easily >> fixable in clvmd code, but nobody cares. > > Thanks for the explanation, this is interresting for me as I need a volume > manager in the cluster to manager the shared file systems in case I need to > resize for some reason. I think I may be coming up against something similar > now that I am testing cman outside of the cluster, even though I have > cman/clvmd enabled outside pacemaker the clvmd daemon still hangs even when > the 2nd node has been rebooted due to a fence operation,
If you have configured cman to use fence_pcmk, then all cman/dlm/clvmd fencing operations are sent to Pacemaker. If you aren't running pacemaker, then you have a big problem as no-one can perform fencing. I don't know if you are testing without pacemaker running, but if so you would need to configure cman with real fencing devices. > when it (node 2) reboots, cman & clvmd starts, I can see both nodes as > members using cman_tool, but clvmd still seems to have an issue, it just > hangs, I cant see off-hand if dlm still thinks pacemaker is in the fence > operation (or if it has already returned true for successful fence). I am > still gathering logs and will post back to this thread once I have all my > logs from yesterday and this morning. > > I dont suppose there is another volume manager available that would be > cluster aware that anyone is aware of? > >> >> Increasing timeout for LSB clvmd resource probably wont help you, >> because blocked (because of DLM waits for fencing) LVM operations iirc >> never finish. >> >> You may want to search for clvmd OCF resource-agent, it is available for >> SUSE I think. Although it is not perfect, it should work much better for >> you > > I will have a look around for this clvmd ocf agent, and see what is involverd > in getting it to work on CentOS 6.5 if I dont have any success with the > current recommendation for running it outside of pacemaker control. > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org