Hi, 1.- monitor_ds.sh may use LVM commands (vgdisplay) that needs sudo access. It should be automatically setup by the opennebula node packages.
2.- It is not a real daemon, the first time a host is monitored a process is left to periodically send information. OpenNebula restarts it if no information is received in 3 monitor steps. Nothing needs to be set up... Cheers On Wed, Jul 30, 2014 at 3:50 PM, Steven Timm <[email protected]> wrote: > On Wed, 30 Jul 2014, Ruben S. Montero wrote: > > >> Maybe you could try to execute the monitor probes in the node, >> >> 1. ssh the node >> 2. Go to /var/tmp/one/im >> 3. Execute run_probes kvm-probes >> > > When I do that, (using sh -x ) I get the following: > > -bash-4.1$ sh -x ./run_probes kvm-probes > ++ dirname ./run_probes > + source ./../scripts_common.sh > ++ export LANG=C > ++ LANG=C > ++ export PATH=/bin:/sbin:/usr/bin:/usr/krb5/bin:/usr/lib64/qt-3.3/ > bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin > ++ PATH=/bin:/sbin:/usr/bin:/usr/krb5/bin:/usr/lib64/qt-3.3/ > bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin > ++ AWK=awk > ++ BASH=bash > ++ CUT=cut > ++ DATE=date > ++ DD=dd > ++ DF=df > ++ DU=du > ++ GREP=grep > ++ ISCSIADM=iscsiadm > ++ LVCREATE=lvcreate > ++ LVREMOVE=lvremove > ++ LVRENAME=lvrename > ++ LVS=lvs > ++ LN=ln > ++ MD5SUM=md5sum > ++ MKFS=mkfs > ++ MKISOFS=genisoimage > ++ MKSWAP=mkswap > ++ QEMU_IMG=qemu-img > ++ RADOS=rados > ++ RBD=rbd > ++ READLINK=readlink > ++ RM=rm > ++ SCP=scp > ++ SED=sed > ++ SSH=ssh > ++ SUDO=sudo > ++ SYNC=sync > ++ TAR=tar > ++ TGTADM=tgtadm > ++ TGTADMIN=tgt-admin > ++ TGTSETUPLUN=tgt-setup-lun-one > ++ TR=tr > ++ VGDISPLAY=vgdisplay > ++ VMKFSTOOLS=vmkfstools > ++ WGET=wget > +++ uname -s > ++ '[' xLinux = xLinux ']' > ++ SED='sed -r' > +++ basename ./run_probes > ++ SCRIPT_NAME=run_probes > + export LANG=C > + LANG=C > + HYPERVISOR_DIR=kvm-probes.d > + ARGUMENTS=kvm-probes > ++ dirname ./run_probes > + SCRIPTS_DIR=. > + cd . > ++ '[' -d kvm-probes.d ']' > ++ run_dir kvm-probes.d > ++ cd kvm-probes.d > +++ ls architecture.sh collectd-client-shepherd.sh cpu.sh kvm.rb > monitor_ds.sh name.sh poll.sh version.sh > ++ for i in '`ls *`' > ++ '[' -x architecture.sh ']' > ++ ./architecture.sh kvm-probes > ++ EXIT_CODE=0 > ++ '[' x0 '!=' x0 ']' > ++ for i in '`ls *`' > ++ '[' -x collectd-client-shepherd.sh ']' > ++ ./collectd-client-shepherd.sh kvm-probes > ++ EXIT_CODE=0 > ++ '[' x0 '!=' x0 ']' > ++ for i in '`ls *`' > ++ '[' -x cpu.sh ']' > ++ ./cpu.sh kvm-probes > ++ EXIT_CODE=0 > ++ '[' x0 '!=' x0 ']' > ++ for i in '`ls *`' > ++ '[' -x kvm.rb ']' > ++ ./kvm.rb kvm-probes > ++ EXIT_CODE=0 > ++ '[' x0 '!=' x0 ']' > ++ for i in '`ls *`' > ++ '[' -x monitor_ds.sh ']' > ++ ./monitor_ds.sh kvm-probes > [sudo] password for oneadmin: > > and it stays hung on the password for oneadmin. > > What's going on? > > Also, you mentioned a collectd--are you saying that OpenNebula 4.6 now > needs to run a daemon on every single VM host? Where is it documented > on how to set it up? > > Steve > > > > > > > >> Make sure you do not have a host using the same hostname fgtest14 and >> running a collectd process >> >> On Jul 29, 2014 4:35 PM, "Steven Timm" <[email protected]> wrote: >> >> I am still trying to debug a nasty monitoring inconsistency. >> >> -bash-4.1$ onevm list | grep fgtest14 >> 26 oneadmin oneadmin fgt6x4-26 runn 6 4G fgtest14 >> 117d 19h50 >> 27 oneadmin oneadmin fgt5x4-27 runn 10 4G fgtest14 >> 117d 17h57 >> 28 oneadmin oneadmin fgt1x1-28 runn 10 4.1G fgtest14 >> 117d 16h59 >> 30 oneadmin oneadmin fgt5x1-30 runn 0 4G fgtest14 >> 116d 23h50 >> 33 oneadmin oneadmin ip6sl5vda-33 runn 6 4G fgtest14 >> 116d 19h57 >> -bash-4.1$ onehost list >> ID NAME CLUSTER RVM ALLOCATED_CPU >> ALLOCATED_MEM STAT >> 3 fgtest11 ipv6 0 0 / 400 (0%) 0K / 15.7G >> (0%) on >> 4 fgtest12 ipv6 0 0 / 400 (0%) 0K / 15.7G >> (0%) on >> 7 fgtest13 ipv6 0 0 / 800 (0%) 0K / 23.6G >> (0%) on >> 8 fgtest14 ipv6 5 0 / 800 (0%) 0K / 23.6G >> (0%) on >> 9 fgtest20 ipv6 3 300 / 800 (37%) 12G / 31.4G >> (38%) on >> 11 fgtest19 ipv6 0 0 / 800 (0%) 0K / 31.5G >> (0%) on >> -bash-4.1$ onehost show 8 >> HOST 8 INFORMATION >> ID : 8 >> NAME : fgtest14 >> CLUSTER : ipv6 >> STATE : MONITORED >> IM_MAD : kvm >> VM_MAD : kvm >> VN_MAD : dummy >> LAST MONITORING TIME : 07/29 09:25:45 >> >> HOST SHARES >> TOTAL MEM : 23.6G >> USED MEM (REAL) : 876.4M >> USED MEM (ALLOCATED) : 0K >> TOTAL CPU : 800 >> USED CPU (REAL) : 0 >> USED CPU (ALLOCATED) : 0 >> RUNNING VMS : 5 >> >> LOCAL SYSTEM DATASTORE #102 CAPACITY >> TOTAL: : 548.8G >> USED: : 175.3G >> FREE: : 345.6G >> >> MONITORING INFORMATION >> ARCH="x86_64" >> CPUSPEED="2992" >> HOSTNAME="fgtest14.fnal.gov" >> HYPERVISOR="kvm" >> MODELNAME="Intel(R) Xeon(R) CPU E5450 @ 3.00GHz" >> NETRX="234844577" >> NETTX="21553126" >> RESERVED_CPU="" >> RESERVED_MEM="" >> VERSION="4.6.0" >> >> VIRTUAL MACHINES >> >> ID USER GROUP NAME STAT UCPU UMEM HOST TIME >> 26 oneadmin oneadmin fgt6x4-26 runn 6 4G fgtest14 >> 117d 19h50 >> 27 oneadmin oneadmin fgt5x4-27 runn 10 4G fgtest14 >> 117d 17h57 >> 28 oneadmin oneadmin fgt1x1-28 runn 10 4.1G fgtest14 >> 117d 17h00 >> 30 oneadmin oneadmin fgt5x1-30 runn 0 4G fgtest14 >> 116d 23h50 >> 33 oneadmin oneadmin ip6sl5vda-33 runn 6 4G fgtest14 >> 116d 19h57 >> ------------------------------------------------------------ >> ----------------------- >> >> All of this looks great, right? >> Just one problem: There are no VM's running on fgtest14 and >> haven't been for 4 days. >> >> [root@fgtest14 ~]# virsh list >> Id Name State >> ---------------------------------------------------- >> >> [root@fgtest14 ~]# >> >> ------------------------------------------------------------ >> ------------- >> Yet the monitoring reports no errors. >> >> Tue Jul 29 09:28:10 2014 [InM][D]: Host fgtest14 (8) successfully >> monitored. >> >> ------------------------------------------------------------ >> ----------------- >> At the same time, there is no evidence that ONE is actually trying >> to or >> succeeding to monitor these five vm's yet they are still stuck in >> "runn" >> which means I can't do a onevm restart to restart them. >> (the vm images of these 5 vm's are still out there on the VM host >> and >> I would like to save and restart them if I can). >> >> What is the remotes command that ONE4.6 would use to monitor this >> host? >> Can I do it manually and see what output I get? >> >> Are we dealing with some kind of a bug, or just a very confused >> system? >> Any help is appreciated. I have to get this sorted out before >> I dare deploy one4.x in production. >> >> Steve Timm >> >> >> ------------------------------------------------------------------ >> Steven C. Timm, Ph.D (630) 840-8525 >> [email protected] http://home.fnal.gov/~timm/ >> Fermilab Scientific Computing Division, Scientific Computing >> Services Quad. >> Grid and Cloud Services Dept., Associate Dept. Head for Cloud >> Computing >> _______________________________________________ >> Users mailing list >> [email protected] >> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org >> >> >> >> > ------------------------------------------------------------------ > Steven C. Timm, Ph.D (630) 840-8525 > [email protected] http://home.fnal.gov/~timm/ > Fermilab Scientific Computing Division, Scientific Computing Services Quad. > Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing -- -- Ruben S. Montero, PhD Project co-Lead and Chief Architect OpenNebula - Flexible Enterprise Cloud Made Simple www.OpenNebula.org | [email protected] | @OpenNebula
_______________________________________________ Users mailing list [email protected] http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
