Thanks Ruben.

onedb fsck turned up and fixed a bunch of problems including the main
one--that fgtest14 had once been host ID 10 and I had mistakenly
re-inserted into the db as host ID 8.   had to manually modify the
mysql on those 5 entries in VM pool to change the <hid> from
10 to 8, but once I did then opennebula finally detected
that they were down and now shows them as UNKN.

There is one remaining problem and that is the following:

To successfully modify the BODY field in the vm_pool of the mysql database
it was necessary to strip out some newlines and single quotes that
were in the XML and so now I have XML that doesn't actually work to
start a VM.
(I did a mysql command
update vm_pool set body='a bunch of xml' where oid=nnn;
and the mysql syntax supported neither newlines or single quotes.  That's
a problem because some of the things we are using need single quotes
and maybe newlines too.

does anyone have an xml editor that can more easily modify the
text of the body field in the opennebula database?

Steve Timm

(ps--before the xml in question looked like this:


                <devices>
                <serial type='pty'>
                        <target port='0'/>
                </serial>
                <console type='pty'>
                <target type='serial' port='0'/>
                </console>
                </devices>


And now it looks like this:

< <devices> <serial type=pty> <target port=0/> </serial> <console type=pty> <target type=serial port=0/> </console> </devices>
---

Steve Timm



On Wed, 30 Jul 2014, Ruben S. Montero wrote:

This seems to be a problem when upgrading the DB, See the inconsistency in 
fgtest14: 
<RUNNING_VMS>5</RUNNING_VMS>....<VMS></VMS>

That's the reason for not seeing any action taken on VM 26 it is not registered in 
the host (empty <VM> set)

I suggest to stop oned and execute onedb fsck

Cheers


On Wed, Jul 30, 2014 at 4:44 PM, Steven Timm <[email protected]> wrote:
      OK--I have now installed the opennebula-node-kvm rpm on
      all of the VM hosts (SURPRISE), made sure that the collectd
      that is running is the current one from opennebula 4.6,
      and verified that the run_probes kvm-probes can
      run interactively as oneadmin on all of the nodes.  the one on
      fgtest14 correctly reports that there are no running VM's,
      and the two machines that do have running vm's correctly report
      that they do have running VM's.

      Only problem is, the five virtual machines that opennebula still thinks
      are running on fgtest14, still report back as running
      even though opennebula hasn't made any attempt to monitor them?

      How do we get things back into sync and tell opennebula that VM #26
      isn't really running anymore? Is there a way to force this vm into 
"unknown" state so we can do a onevm boot on it?
       Database hackery included?  Even better, has someone come up with an XML 
hacker to
      do the XML substitition of one field in the huge mysql field?

      Even more important:  it's clear that the monitoring was obviously
      failing and failing for a long time because we didn't have the
      sudoers file there that the opennebula-node-kvm provides.
      But there was absolutely no warning of that.. as far as the
      head node was concerned we were happy as a clam.


      ----

      The important pieces of output from run_probes kvm-probes

      fgtest19
      ARCH=x86_64
      MODELNAME="Intel(R) Xeon(R) CPU           E5450  @ 3.00GHz"
      HYPERVISOR=kvm
      TOTALCPU=800
      CPUSPEED=2992
      TOTALMEMORY=33010680
      USEDMEMORY=1586216
      FREEMEMORY=31424464
      FREECPU=800.0
      USEDCPU=0.0
      NETRX=5958104400
      NETTX=2323329968
      DS_LOCATION_USED_MB=1924
      DS_LOCATION_TOTAL_MB=280380
      DS_LOCATION_FREE_MB=264129
      DS = [
        ID = 102,
        USED_MB = 1924,
        TOTAL_MB = 280380,
        FREE_MB = 264129
      ]
      HOSTNAME=fgtest19.fnal.gov
      VM_POLL=YES
      VM=[
        ID=55,
        DEPLOY_ID=one-55,
        POLL="NETRX=25289118 USEDCPU=0.0 NETTX=214808 USEDMEMORY=4194304 
STATE=a" ]
      VERSION="4.6.0"
      fgtest20
      ARCH=x86_64
      MODELNAME="Intel(R) Xeon(R) CPU           E5450  @ 3.00GHz"
      HYPERVISOR=kvm
      TOTALCPU=800
      CPUSPEED=2992
      TOTALMEMORY=32875804
      USEDMEMORY=8801100
      FREEMEMORY=24074704
      FREECPU=793.6
      USEDCPU=6.39999999999998
      NETRX=184155823062
      NETTX=58685116817
      DS_LOCATION_USED_MB=50049
      DS_LOCATION_TOTAL_MB=281012
      DS_LOCATION_FREE_MB=216499
      DS = [
        ID = 102,
        USED_MB = 50049,
        TOTAL_MB = 281012,
        FREE_MB = 216499
      ]
      HOSTNAME=fgtest20.fnal.gov
      VM_POLL=YES
      VM=[
        ID=31,
        DEPLOY_ID=one-31,
        POLL="NETRX=71728978887 USEDCPU=0.5 NETTX=54281255903 USEDMEMORY=4270812 
STATE=a" ]
      VM=[
        ID=24,
        DEPLOY_ID=one-24,
        POLL="NETRX=2383960153 USEDCPU=0.0 NETTX=17345416 USEDMEMORY=4194304 
STATE=a" ]
      VM=[
        ID=48,
        DEPLOY_ID=one-48,
        POLL="NETRX=2546074171 USEDCPU=0.0 NETTX=145782495 USEDMEMORY=4194304 
STATE=a" ]
      VERSION="4.6.0"

      fgtest14
      ARCH=x86_64
      MODELNAME="Intel(R) Xeon(R) CPU           E5450  @ 3.00GHz"
      HYPERVISOR=kvm
      TOTALCPU=800
      CPUSPEED=2992
      TOTALMEMORY=24736796
      USEDMEMORY=937004
      FREEMEMORY=23799792
      FREECPU=800.0
      USEDCPU=0.0
      NETRX=285471609
      NETTX=25467521
      DS_LOCATION_USED_MB=179498
      DS_LOCATION_TOTAL_MB=561999
      DS_LOCATION_FREE_MB=353864
      DS = [
        ID = 102,
        USED_MB = 179498,
        TOTAL_MB = 561999,
        FREE_MB = 353864
      ]

      -------------------------
      And the appropriate excerpts from oned.log:

      /var/log/one/oned.log.20140728111811:Fri Jul 25 15:22:05 2014 [DiM][D]: 
Restarting VM 26
      /var/log/one/oned.log.20140728111811:Fri Jul 25 15:22:05 2014 [DiM][E]: 
Could not restart VM 26, wrong state.
      /var/log/one/oned.log.20140728111811:Fri Jul 25 15:37:48 2014 [DiM][D]: 
Stopping VM 26
      /var/log/one/oned.log.20140728111811:Fri Jul 25 15:37:48 2014 [VMM][D]: 
VM 26 successfully monitored: STATE=-
      -----------------------------------

      This is the mysql row in host_pool for host fgtest14
      mysql>
      mysql> select * from host_pool where oid=8 \G
      *************************** 1. row ***************************
                oid: 8
               name: fgtest14
               
body:<HOST><ID>8</ID><NAME>fgtest14</NAME><STATE>2</STATE><IM_MAD>kvm</IM_MAD><VM_MAD>kvm</VM_MAD><VN_MAD>dummy</VN_MAD><LAST_MON_TIME>1
406731190</LAST_MON_TIME><CLUSTER_ID>101</CLUSTER_ID><CLUSTER>ipv6</CLUSTER><HOST_SHARE><DISK_USAGE>0</DISK_USAGE><MEM_USAGE>0</MEM
_USAGE><CPU_USAGE>0</CPU_USAGE><MAX_DISK>561999</MAX_DISK><MAX_MEM>24736796</MAX_MEM><MAX_CPU>800</MAX_CPU><FREE_DISK>353864</FREE_
DISK><FREE_MEM>23802216</FREE_MEM><FREE_CPU>800</FREE_CPU><USED_DISK>179498</USED_DISK><USED_MEM>934580</USED_MEM><USED_CPU>0</USED
_CPU><RUNNING_VMS>5</RUNNING_VMS><DATASTORES><DS><FREE_MB><![CDATA[353864]]></FREE_MB><ID><![CDATA[102]]></ID><TOTAL_MB><![CDATA[56
1999]]></TOTAL_MB><USED_MB><![CDATA[179498]]></USED_MB></DS></DATASTORES></HOST_SHARE><VMS></VMS><TEMPLATE><ARCH><![CDATA[x86_64]]>
</ARCH><CPUSPEED><![CDATA[2992]]></CPUSPEED><HOSTNAME><![CDATA[fgtest14.fnal.gov]]></HOSTNAME><HYPERVISOR><![CDATA[kvm]]></HYPERVIS
      OR><MODELNAME><![CDATA[Intel(R) Xeon(R) CPU           E5450  
@3.00GHz]]></MODELNAME><NETRX><![CDATA[285677608]]></NETRX><NETTX><![CDATA[25489275]]></NETTX><RESERVED_CPU><![CDATA[]]></RESERVED_C
      
PU><RESERVED_MEM><![CDATA[]]></RESERVED_MEM><VERSION><![CDATA[4.6.0]]></VERSION></TEMPLATE></HOST>
              state: 2
      last_mon_time: 1406731190
                uid: 0
                gid: 0
            owner_u: 1
            group_u: 0
            other_u: 0
                cid: 101
      1 row in set (0.00 sec)



      And this is the row in vm_pool for VM id 26

      *************************** 1. row ***************************
            oid: 26
           name: fgt6x4-26
           
body:<VM><ID>26</ID><UID>0</UID><GID>0</GID><UNAME>oneadmin</UNAME><GNAME>oneadmin</GNAME><NAME>fgt6x4-26</NAME><PERMISSIONS><OWNER_U>1<
/OWNER_U><OWNER_M>1</OWNER_M><OWNER_A>0</OWNER_A><GROUP_U>0</GROUP_U><GROUP_M>0</GROUP_M><GROUP_A>0</GROUP_A><OTHER_U>0</OTHER_U><O
THER_M>0</OTHER_M><OTHER_A>0</OTHER_A></PERMISSIONS><LAST_POLL>1406320668</LAST_POLL><STATE>3</STATE><LCM_STATE>3</LCM_STATE><RESCH
ED>0</RESCHED><STIME>1396463735</STIME><ETIME>0</ETIME><DEPLOY_ID>one-26</DEPLOY_ID><MEMORY>4194304</MEMORY><CPU>6</CPU><NET_TX>748
      982286</NET_TX><NET_RX>1588690678</NET_RX><TEMPLATE><AUTOMATIC_REQUIREMENTS><![CDATA[CLUSTER_ID = 101 
& !(PUBLIC_CLOUD 
=YES)]]></AUTOMATIC_REQUIREMENTS><CONTEXT><CTX_USER><![CDATA[PFVTRVI+PElEPjA8L0lEPjxHSUQ+MDwvR0lEPjxHUk9VUFM+PElEPjA8L0lEPjwvR1JPVVB
TPjxHTkFNRT5vbmVhZG1pbjwvR05BTUU+PE5BTUU+b25lYWRtaW48L05BTUU+PFBBU1NXT1JEPjFmNjQxYzdlMzZkZWU5MmUzNDQ0Mjk2NmI1OTYwMGJkMGE3ZmU5ZDQ8L1
BBU1NXT1JEPjxBVVRIX0RSSVZFUj5jb3JlPC9BVVRIX0RSSVZFUj48RU5BQkxFRD4xPC9FTkFCTEVEPjxURU1QTEFURT48VE9LRU5fUEFTU1dPUkQ+PCFbQ0RBVEFbNzFhY
zU0OWM5MzhmNjA0NmY3NDEzMDI4Y2ZhOGNjODU2YzI2ZGNhNV1dPjwvVE9LRU5fUEFTU1dPUkQ+PC9URU1QTEFURT48REFUQVNUT1JFX1FVT1RBPjwvREFUQVNUT1JFX1FV
T1RBPjxORVRXT1JLX1FVT1RBPjwvTkVUV09SS19RVU9UQT48Vk1fUVVPVEE+PC9WTV9RVU9UQT48SU1BR0VfUVVPVEE+PC9JTUFHRV9RVU9UQT48L1VTRVI+]]></CTX_US
ER><DISK_ID><![CDATA[2]]></DISK_ID><ETH0_DNS><![CDATA[131.225.0.254]]></ETH0_DNS><ETH0_GATEWAY><![CDATA[131.225.41.200]]></ETH0_GAT
EWAY><ETH0_IP><![CDATA[131.225.41.169]]></ETH0_IP><ETH0_IPV6><![CDATA[2001:400:2410:29::169]]></ETH0_IPV6><ETH0_MAC><![CDATA[00:16:
3e:06:06:04]]></ETH0_MAC><ETH0_MASK><![CDATA[255.255.255.128]]></ETH0_MASK><FILES><![CDATA[/cloud/images/OpenNebula/scripts/one3.2/
      contextualization/init.sh 
/cloud/images/OpenNebula/scripts/one3.2/contextualization/credentials.sh/cloud/images/OpenNebula/scripts/one3.2/contextualization/kerberos.sh]]></FILES><GATEWAY><![CDATA[131.225.41.200]]></GATEWAY><INIT_
      SCRIPTS><![CDATA[init.sh 
credentials.shkerberos.sh]]></INIT_SCRIPTS><IP_PUBLIC><![CDATA[131.225.41.169]]></IP_PUBLIC><NETMASK><![CDATA[255.255.255.128]]></NETMASK><NETWOR
K><![CDATA[YES]]></NETWORK><ROOT_PUBKEY><![CDATA[id_dsa.pub]]></ROOT_PUBKEY><TARGET><![CDATA[hdc]]></TARGET><USERNAME><![CDATA[open
nebula]]></USERNAME><USER_PUBKEY><![CDATA[id_dsa.pub]]></USER_PUBKEY></CONTEXT><CPU><![CDATA[1]]></CPU><DISK><CLONE><![CDATA[NO]]><
/CLONE><CLONE_TARGET><![CDATA[SYSTEM]]></CLONE_TARGET><CLUSTER_ID><![CDATA[101]]></CLUSTER_ID><DATASTORE><![CDATA[ip6_img_ds]]></DA
TASTORE><DATASTORE_ID><![CDATA[101]]></DATASTORE_ID><DEV_PREFIX><![CDATA[hd]]></DEV_PREFIX><DISK_ID><![CDATA[0]]></DISK_ID><IMAGE><
![CDATA[fgt6x4_os]]></IMAGE><IMAGE_ID><![CDATA[5]]></IMAGE_ID><IMAGE_UNAME><![CDATA[oneadmin]]></IMAGE_UNAME><LN_TARGET><![CDATA[SY
STEM]]></LN_TARGET><PERSISTENT><![CDATA[YES]]></PERSISTENT><READONLY><![CDATA[NO]]></READONLY><SAVE><![CDATA[YES]]></SAVE><SIZE><![
CDATA[46080]]></SIZE><SOURCE><![CDATA[/var/lib/one//datastores/101/3078b4235100008fbdbf9dff7eea95b1]]></SOURCE><TARGET><![CDATA[vda
]]></TARGET><TM_MAD><![CDATA[ssh]]></TM_MAD><TYPE><![CDATA[FILE]]></TYPE></DISK><DISK><DEV_PREFIX><![CDATA[hd]]></DEV_PREFIX><DISK_
ID><![CDATA[1]]></DISK_ID><SIZE><![CDATA[5120]]></SIZE><TARGET><![CDATA[vdb]]></TARGET><TYPE><![CDATA[swap]]></TYPE></DISK><FEATURE
S><ACPI><![CDATA[yes]]></ACPI></FEATURES><GRAPHICS><AUTOPORT><![CDATA[yes]]></AUTOPORT><KEYMAP><![CDATA[en-us]]></KEYMAP><LISTEN><!
[CDATA[127.0.0.1]]></LISTEN><PORT><![CDATA[5926]]></PORT><TYPE><![CDATA[vnc]]></TYPE></GRAPHICS><MEMORY><![CDATA[4096]]></MEMORY><N
IC><BRIDGE><![CDATA[br0]]></BRIDGE><CLUSTER_ID><![CDATA[101]]></CLUSTER_ID><IP><![CDATA[131.225.41.169]]></IP><IP6_LINK><![CDATA[fe
80::216:3eff:fe06:604]]></IP6_LINK><MAC><![CDATA[00:16:3e:06:06:04]]></MAC><MODEL><![CDATA[virtio]]></MODEL><NETWORK><![CDATA[Stati
c_IPV6_Public]]></NETWORK><NETWORK_ID><![CDATA[1]]></NETWORK_ID><NETWORK_UNAME><![CDATA[oneadmin]]></NETWORK_UNAME><NIC_ID><![CDATA
      
[0]]></NIC_ID><VLAN><![CDATA[NO]]></VLAN></NIC><OS><ARCH><![CDATA[x86_64]]></ARCH></OS><RAW><DATA><![CDATA[
                      <devices>
                      <serial type='pty'>
                              <target port='0'/>
                      </serial>
                      <console type='pty'>
                      <target type='serial' port='0'/>
                      </console>

</devices>]]></DATA><TYPE><![CDATA[kvm]]></TYPE></RAW><TEMPLATE_ID><![CDATA[6]]></TEMPLATE_ID><VCPU><![CDATA[2]]></VCPU><VMID><![CD
      ATA[26]]></VMID></TEMPLATE><USER_TEMPLATE><ERROR><![CDATA[Fri Jul 25 
15:37:48 2014 : Error saving VM state: Could not
      save one-26 
to/var/lib/one/datastores/102/26/checkpoint]]></ERROR><NPTYPE><![CDATA[NPERNLM]]></NPTYPE><RANK><![CDATA[FREEMEMORY]]></RANK><USERVO>
<![CDATA[test181818]]></USERVO></USER_TEMPLATE><HISTORY_RECORDS><HISTORY><OID>26</OID><SEQ>0</SEQ><HOSTNAME>fgtest14</HOSTNAME><HID
>10</HID><CID>101</CID><STIME>1396463752</STIME><ETIME>0</ETIME><VMMMAD>kvm</VMMMAD><VNMMAD>dummy</VNMMAD><TMMAD>ssh</TMMAD><DS_LOC
ATION>/var/lib/one/datastores</DS_LOCATION><DS_ID>102</DS_ID><PSTIME>1396463752</PSTIME><PETIME>1396465032</PETIME><RSTIME>13964650
32</RSTIME><RETIME>0</RETIME><ESTIME>0</ESTIME><EETIME>0</EETIME><REASON>0</REASON><ACTION>0</ACTION></HISTORY></HISTORY_RECORDS></
      VM>
            uid: 0
            gid: 0
      last_poll: 1406320668
          state: 3
      lcm_state: 3
        owner_u: 1
        group_u: 0
        other_u: 0
      1 row in set (0.00 sec)


      -------------------------------



      On Wed, 30 Jul 2014, Steven Timm wrote:

            On Wed, 30 Jul 2014, Ruben S. Montero wrote:


                   Not really sure what can be going on...  The monitor scripts 
return the
                   information of all VMs running in the node.  In 4.6 the
                   monitoring system uses a push approach,  through UDP,  so 
you may have the
                   information being reported by misbehaved monitoring
                   daemons.  Sometimes this may happen in dev environments if 
you are
                   resetting the DB,... 


            when we ran the update to take this database from ONE4.4 to ONE4.6, 
one host (the aforementioned fgtest14)
            and one datastore (image store 101) got
            wiped out of the database, I reinserted them both back in and 
restarted opennebula.

            Steve Timm





                   On Jul 28, 2014 6:32 PM, "Steven Timm" <[email protected]> wrote:

                         I am currently dealing with an unexplained monitoring 
question
                         in OpenNebula 4.6 on my development cloud.

                         I frequently see OpenNebula return that the status of 
a ONe
                         host is "ON" even in the case of a system 
misconfiguration where,
                         given the credentials, it is impossible for opennebula 
to
                         even ssh into the node as oneadmin.


                         I've fixed all those instances, restarted OpenNebula,
                         but opennebula still reports a number of VM's
                         in state "running" even though the node they are 
running
                         on was rebooted three days ago and is running no
                         virtual machines whatsoever.

                         I think I could be dealing with database corruption of 
some type
                         (generated on the one4.4->one4.6 update), or there 
could
                         be some problem with the remote scripts on the nodes.
                         I saw, and I think I fixed, the problems with the 
database
                         corruption (namely one of the hosts and one of the 
datastores
                         got knocked out of the database for reasons unknown, 
and I
                         re-inserted them).   But in any case there is some
                         error handling that is not working in the monitoring
                         and something is exiting with status 0 that shouldn't 
be.

                         ideas?  Has anyone else seen something like this?

                         Steve Timm



                         
------------------------------------------------------------------
                         Steven C. Timm, Ph.D  (630) 840-8525
                         [email protected]  http://home.fnal.gov/~timm/
                         Fermilab Scientific Computing Division, Scientific 
Computing
                         Services Quad.
                         Grid and Cloud Services Dept., Associate Dept. Head 
for Cloud
                         Computing
                         _______________________________________________
                         Users mailing list
                         [email protected]
                  http: //lists.opennebula.org/listinfo.cgi/users-opennebula.org




            ------------------------------------------------------------------
            Steven C. Timm, Ph.D  (630) 840-8525
            [email protected]  http://home.fnal.gov/~timm/
            Fermilab Scientific Computing Division, Scientific Computing 
Services Quad.
            Grid and Cloud Services Dept., Associate Dept. Head for Cloud 
Computing


      ------------------------------------------------------------------
      Steven C. Timm, Ph.D  (630) 840-8525
      [email protected]  http://home.fnal.gov/~timm/
      Fermilab Scientific Computing Division, Scientific Computing Services 
Quad.
      Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing




--
-- 
Ruben S. Montero, PhD
Project co-Lead and Chief Architect OpenNebula - Flexible Enterprise Cloud Made 
Simple
www.OpenNebula.org | [email protected] | @OpenNebula



------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
[email protected]  http://home.fnal.gov/~timm/
Fermilab Scientific Computing Division, Scientific Computing Services Quad.
Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing
_______________________________________________
Users mailing list
[email protected]
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Reply via email to