On Wed, 30 Jul 2014, Ruben S. Montero wrote:


Maybe you could try to execute the  monitor probes in the node, 

1. ssh the node
2. Go to /var/tmp/one/im
3. Execute run_probes kvm-probes

When I do that, (using sh -x ) I get the following:

-bash-4.1$ sh -x ./run_probes kvm-probes
++ dirname ./run_probes
+ source ./../scripts_common.sh
++ export LANG=C
++ LANG=C
++ export PATH=/bin:/sbin:/usr/bin:/usr/krb5/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin ++ PATH=/bin:/sbin:/usr/bin:/usr/krb5/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
++ AWK=awk
++ BASH=bash
++ CUT=cut
++ DATE=date
++ DD=dd
++ DF=df
++ DU=du
++ GREP=grep
++ ISCSIADM=iscsiadm
++ LVCREATE=lvcreate
++ LVREMOVE=lvremove
++ LVRENAME=lvrename
++ LVS=lvs
++ LN=ln
++ MD5SUM=md5sum
++ MKFS=mkfs
++ MKISOFS=genisoimage
++ MKSWAP=mkswap
++ QEMU_IMG=qemu-img
++ RADOS=rados
++ RBD=rbd
++ READLINK=readlink
++ RM=rm
++ SCP=scp
++ SED=sed
++ SSH=ssh
++ SUDO=sudo
++ SYNC=sync
++ TAR=tar
++ TGTADM=tgtadm
++ TGTADMIN=tgt-admin
++ TGTSETUPLUN=tgt-setup-lun-one
++ TR=tr
++ VGDISPLAY=vgdisplay
++ VMKFSTOOLS=vmkfstools
++ WGET=wget
+++ uname -s
++ '[' xLinux = xLinux ']'
++ SED='sed -r'
+++ basename ./run_probes
++ SCRIPT_NAME=run_probes
+ export LANG=C
+ LANG=C
+ HYPERVISOR_DIR=kvm-probes.d
+ ARGUMENTS=kvm-probes
++ dirname ./run_probes
+ SCRIPTS_DIR=.
+ cd .
++ '[' -d kvm-probes.d ']'
++ run_dir kvm-probes.d
++ cd kvm-probes.d
+++ ls architecture.sh collectd-client-shepherd.sh cpu.sh kvm.rb monitor_ds.sh name.sh poll.sh version.sh
++ for i in '`ls *`'
++ '[' -x architecture.sh ']'
++ ./architecture.sh kvm-probes
++ EXIT_CODE=0
++ '[' x0 '!=' x0 ']'
++ for i in '`ls *`'
++ '[' -x collectd-client-shepherd.sh ']'
++ ./collectd-client-shepherd.sh kvm-probes
++ EXIT_CODE=0
++ '[' x0 '!=' x0 ']'
++ for i in '`ls *`'
++ '[' -x cpu.sh ']'
++ ./cpu.sh kvm-probes
++ EXIT_CODE=0
++ '[' x0 '!=' x0 ']'
++ for i in '`ls *`'
++ '[' -x kvm.rb ']'
++ ./kvm.rb kvm-probes
++ EXIT_CODE=0
++ '[' x0 '!=' x0 ']'
++ for i in '`ls *`'
++ '[' -x monitor_ds.sh ']'
++ ./monitor_ds.sh kvm-probes
[sudo] password for oneadmin:

and it stays hung on the password for oneadmin.

What's going on?

Also, you mentioned a collectd--are you saying that OpenNebula 4.6 now needs to run a daemon on every single VM host? Where is it documented
on how to set it up?

Steve







Make sure you do not have a host using the same hostname fgtest14 and running a 
 collectd process

On Jul 29, 2014 4:35 PM, "Steven Timm" <t...@fnal.gov> wrote:

      I am still trying to debug a nasty monitoring inconsistency.

      -bash-4.1$ onevm list | grep fgtest14
          26 oneadmin oneadmin fgt6x4-26       runn    6      4G fgtest14   
117d 19h50
          27 oneadmin oneadmin fgt5x4-27       runn   10      4G fgtest14   
117d 17h57
          28 oneadmin oneadmin fgt1x1-28       runn   10    4.1G fgtest14   
117d 16h59
          30 oneadmin oneadmin fgt5x1-30       runn    0      4G fgtest14   
116d 23h50
          33 oneadmin oneadmin ip6sl5vda-33    runn    6      4G fgtest14   
116d 19h57
      -bash-4.1$ onehost list
        ID NAME            CLUSTER   RVM      ALLOCATED_CPU      ALLOCATED_MEM 
STAT
         3 fgtest11        ipv6        0       0 / 400 (0%)    0K / 15.7G (0%) 
on
         4 fgtest12        ipv6        0       0 / 400 (0%)    0K / 15.7G (0%) 
on
         7 fgtest13        ipv6        0       0 / 800 (0%)    0K / 23.6G (0%) 
on
         8 fgtest14        ipv6        5       0 / 800 (0%)    0K / 23.6G (0%) 
on
         9 fgtest20        ipv6        3    300 / 800 (37%)  12G / 31.4G (38%) 
on
        11 fgtest19        ipv6        0       0 / 800 (0%)    0K / 31.5G (0%) 
on
      -bash-4.1$ onehost show 8
      HOST 8 INFORMATION
      ID                    : 8
      NAME                  : fgtest14
      CLUSTER               : ipv6
      STATE                 : MONITORED
      IM_MAD                : kvm
      VM_MAD                : kvm
      VN_MAD                : dummy
      LAST MONITORING TIME  : 07/29 09:25:45

      HOST SHARES
      TOTAL MEM             : 23.6G
      USED MEM (REAL)       : 876.4M
      USED MEM (ALLOCATED)  : 0K
      TOTAL CPU             : 800
      USED CPU (REAL)       : 0
      USED CPU (ALLOCATED)  : 0
      RUNNING VMS           : 5

      LOCAL SYSTEM DATASTORE #102 CAPACITY
      TOTAL:                : 548.8G
      USED:                 : 175.3G
      FREE:                 : 345.6G

      MONITORING INFORMATION
      ARCH="x86_64"
      CPUSPEED="2992"
      HOSTNAME="fgtest14.fnal.gov"
      HYPERVISOR="kvm"
      MODELNAME="Intel(R) Xeon(R) CPU           E5450  @ 3.00GHz"
      NETRX="234844577"
      NETTX="21553126"
      RESERVED_CPU=""
      RESERVED_MEM=""
      VERSION="4.6.0"

      VIRTUAL MACHINES

          ID USER     GROUP    NAME            STAT UCPU    UMEM HOST TIME
          26 oneadmin oneadmin fgt6x4-26       runn    6      4G fgtest14   
117d 19h50
          27 oneadmin oneadmin fgt5x4-27       runn   10      4G fgtest14   
117d 17h57
          28 oneadmin oneadmin fgt1x1-28       runn   10    4.1G fgtest14   
117d 17h00
          30 oneadmin oneadmin fgt5x1-30       runn    0      4G fgtest14   
116d 23h50
          33 oneadmin oneadmin ip6sl5vda-33    runn    6      4G fgtest14   
116d 19h57
      
-----------------------------------------------------------------------------------

      All of this looks great, right?
      Just one problem:  There are no VM's running on fgtest14 and
      haven't been for 4 days.

      [root@fgtest14 ~]# virsh list
       Id    Name                           State
      ----------------------------------------------------

      [root@fgtest14 ~]#

      -------------------------------------------------------------------------
      Yet the monitoring reports no errors.

      Tue Jul 29 09:28:10 2014 [InM][D]: Host fgtest14 (8) successfully 
monitored.

      
-----------------------------------------------------------------------------
      At the same time, there is no evidence that ONE is actually trying to or
      succeeding to monitor these five vm's yet they are still stuck in "runn"
      which means I can't do a onevm restart to restart them.
      (the vm images of these 5 vm's are still out there on the VM host and
      I would like to save and restart them if I can).

      What is the remotes command that ONE4.6 would use to monitor this host?
      Can I do it manually and see what output I get?

      Are we dealing with some kind of a bug, or just a very confused system?
      Any help is appreciated. I have to get this sorted out before
      I dare deploy one4.x in production.

      Steve Timm


      ------------------------------------------------------------------
      Steven C. Timm, Ph.D  (630) 840-8525
      t...@fnal.gov  http://home.fnal.gov/~timm/
      Fermilab Scientific Computing Division, Scientific Computing Services 
Quad.
      Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing
      _______________________________________________
      Users mailing list
      Users@lists.opennebula.org
      http://lists.opennebula.org/listinfo.cgi/users-opennebula.org




------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
t...@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Scientific Computing Division, Scientific Computing Services Quad.
Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing
_______________________________________________
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Reply via email to