Hi Rhesa I agree that the problem is related to lvm, probably clvmd cannot acquire locking through DLM. I assume that as you are running the cluster during 3-4 days it is not mis-configured, I've seen this before related to networking problems (usually filtering multicast traffic), can you double check that iptables is allowing all the required cluster traffic?.
Also what is the output of clustat, during the failure? Cheers Ruben On Wed, Feb 26, 2014 at 3:50 AM, Rhesa Mahendra <[email protected] > wrote: > Guys, > > I will create production use San Storage, so i think opennebula need > LVM/CLVM for do, it's have been 3 month for do this, but after i create 50 > VM use one template with 3 node, this lvm/clvm not working fine, status VM > still Prolog after two days, please see : > > > 0:00 bash -c if [ -x "/var/tmp/one/im/run_probes" ]; then > /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 20 0 > idc-conode001; else > 14447 ? S 0:00 /bin/bash /var/tmp/one/im/run_probes kvm > /var/lib/one//datastores 4124 20 0 idc-conode001 > 14454 ? S 0:00 /bin/bash /var/tmp/one/im/run_probes kvm > /var/lib/one//datastores 4124 20 0 idc-conode001 > 14455 ? S 0:00 /bin/bash /var/tmp/one/im/run_probes kvm > /var/lib/one//datastores 4124 20 0 idc-conode001 > 14460 ? S 0:00 /bin/bash ./collectd-client_control.sh kvm > /var/lib/one//datastores 4124 20 0 idc-conode001 > 14467 ? S 0:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes > kvm-probes /var/lib/one//datastores 4124 20 0 idc-conode001 > 14474 ? S 0:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes > kvm-probes /var/lib/one//datastores 4124 20 0 idc-conode001 > 14475 ? S 0:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes > kvm-probes /var/lib/one//datastores 4124 20 0 idc-conode001 > 14498 ? S 0:00 /bin/bash ./monitor_ds.sh kvm-probes > /var/lib/one//datastores 4124 20 0 idc-conode001 > 14525 ? S 0:00 /bin/bash ./monitor_ds.sh kvm-probes > /var/lib/one//datastores 4124 20 0 idc-conode001 > 14526 ? S 0:00 sudo vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-0 > 14527 ? S 0:00 vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-0 > 15417 ? S 0:00 [kdmflush] > 15452 ? Ss 0:00 sshd: oneadmin [priv] > 15454 ? S 0:00 sshd: oneadmin@notty > 15455 ? Ss 0:00 bash -s > 15510 ? Ss 0:00 sshd: oneadmin [priv] > 15512 ? S 0:00 sshd: oneadmin@notty > 15513 ? Ss 0:00 sh -s > 15527 ? S 0:00 sudo lvremove -f /dev/vg-one/lv-one-179-596-0 > 15528 ? S 0:00 lvremove -f /dev/vg-one/lv-one-179-596-0 > > > I use locking type 3, i have 3 node and 1 front end, i use cman and this > is configuration cluster.conf > > <?xml version="1.0"?> > <cluster name="idccluster" config_version="9"> > > <clusternodes> > <clusternode name="idc-vcoz01" votes="1" nodeid="1"><fence><method > name="single"><device > name="idc-vcoz01"/></method></fence></clusternode><clusternode > name="idc-conode001" votes="1" nodeid="2"><fence><method > name="single"><device name="idc-conode001"/></ > method></fence></clusternode><clusternode name="idc-conode002" votes="1" > nodeid="3"><fence><method name="single"><device name="idc-conode002"/></ > method></fence></clusternode><clusternode name="idc-conode003" votes="1" > nodeid="4"><fence><method name="single"><device name="idc-conode003"/></ > method></fence></clusternode></clusternodes> > > <fencedevices> > <fencedevice name="idc-vcoz01" agent="fence_ipmilan"/><fencedevice > name="idc-conode001" agent="fence_ipmilan"/><fencedevice > name="idc-conode002" agent="fence_ipmilan"/><fencedevice > name="idc-conode003" agent="fence_ipmilan"/></fencedevices> > > <rm> > <failoverdomains/> > <resources/> > </rm> > </cluster> > > i shared /etc/cluster/cluster.conf use NFS, > this command use cman_tools > > Node Sts Inc Joined Name > 1 M 304 2014-02-20 16:08:37 idc-vcoz01 > 2 M 288 2014-02-20 16:08:37 idc-conode001 > 3 M 304 2014-02-20 16:08:37 idc-conode002 > 4 M 312 2014-02-26 09:44:04 idc-conode003 > > i think, this vm cannot running because so take a long for waiting > lvcreate or vgdisplay, see this: > > 30818 ? S 0:00 sudo vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > 30819 ? S 0:00 sudo vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > 30820 ? S 0:00 sudo vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > 30821 ? S 0:00 sudo vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > 30824 ? S 0:00 sudo vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > 30825 ? S 0:00 sudo vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > 30827 ? S 0:00 sudo vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > 30842 ? S 0:00 vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > 30843 ? S 0:00 vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > 30844 ? S 0:00 vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > 30845 ? S 0:00 vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > 30846 ? S 0:00 sudo vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > 30847 ? S 0:00 vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > 30852 ? S 0:00 vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > 30853 ? S 0:00 vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > 30857 ? S 0:00 vgdisplay --separator : --units m -o > vg_size,vg_free --nosuffix --noheadings -C vg-one-1 > > > or : > > > 30859 ? S 0:00 sudo lvcreate -L20480.00M -n lv-one-179-610-0 > vg-one > 30860 ? S 0:00 lvcreate -L20480.00M -n lv-one-179-610-0 vg-one > > If i try to restart all server, and all service everything is fine, but > after 3 or 4 days, this problem come again. > This Infrastructure will be production, and i think i must find out how to > fix this, iam not ready if this configuration will be production, so please > help me, and thanks. > > Rhesa. > _______________________________________________ > Users mailing list > [email protected] > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org > -- -- Ruben S. Montero, PhD Project co-Lead and Chief Architect OpenNebula - Flexible Enterprise Cloud Made Simple www.OpenNebula.org | [email protected] | @OpenNebula
_______________________________________________ Users mailing list [email protected] http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
