Hi Carlos,
   the problem is that I can't even get the xml of the vms.
It seems it's something related to how the xml in the "body" column (for both hosts and vms) of the database is structured.

Looking deeply in the migrations scripts, I solved the hosts problem by adding the <vms> node (even without child) under the <host> tag of the body column in "host_pool" table, but for the vms I still have to find a solution.

Now with hosts access I'm able to submit and control new vm instances, but I have dozens of running vms that I'm not even able to destroy (not even with the force switch turned on).

This is the xml of one my hosts, as returned by onehost show -x (relevant names are remmed out via the "[...]" string) :

<HOST>
  <ID>15</ID>
  <NAME>[...]</NAME>
  <STATE>2</STATE>
  <IM_MAD>im_kvm</IM_MAD>
  <VM_MAD>vmm_kvm</VM_MAD>
  <VN_MAD>dummy</VN_MAD>
  <LAST_MON_TIME>1377520947</LAST_MON_TIME>
  <CLUSTER_ID>101</CLUSTER_ID>
  <CLUSTER>[...]</CLUSTER>
  <HOST_SHARE>
    <DISK_USAGE>0</DISK_USAGE>
    <MEM_USAGE>20971520</MEM_USAGE>
    <CPU_USAGE>1800</CPU_USAGE>
    <MAX_DISK>0</MAX_DISK>
    <MAX_MEM>24596936</MAX_MEM>
    <MAX_CPU>2400</MAX_CPU>
    <FREE_DISK>0</FREE_DISK>
    <FREE_MEM>5558100</FREE_MEM>
    <FREE_CPU>2323</FREE_CPU>
    <USED_DISK>0</USED_DISK>
    <USED_MEM>19038836</USED_MEM>
    <USED_CPU>76</USED_CPU>
    <RUNNING_VMS>6</RUNNING_VMS>
  </HOST_SHARE>
  <VMS>
    <ID>326</ID>
  </VMS>
  <TEMPLATE>
    <ARCH><![CDATA[x86_64]]></ARCH>
    <CPUSPEED><![CDATA[1600]]></CPUSPEED>
    <FREECPU><![CDATA[2323.2]]></FREECPU>
    <FREEMEMORY><![CDATA[5558100]]></FREEMEMORY>
    <HOSTNAME><![CDATA[[...]]]></HOSTNAME>
    <HYPERVISOR><![CDATA[kvm]]></HYPERVISOR>
<MODELNAME><![CDATA[Intel(R) Xeon(R) CPU E5645 @ 2.40GHz]]></MODELNAME>
    <NETRX><![CDATA[16007208117863]]></NETRX>
    <NETTX><![CDATA[1185926401588]]></NETTX>
    <TOTALCPU><![CDATA[2400]]></TOTALCPU>
<TOTALMEMORY><![CDATA[24596936]]></TOTALMEMORY>
    <TOTAL_ZOMBIES><![CDATA[5]]></TOTAL_ZOMBIES>
<USEDCPU><![CDATA[76.8000000000002]]></USEDCPU>
<USEDMEMORY><![CDATA[19038836]]></USEDMEMORY>
<ZOMBIES><![CDATA[one-324, one-283, one-314, one-317, one-304]]></ZOMBIES>
  </TEMPLATE>
</HOST>

As you can see, every hosts now recognize the connected vms as "zombies", probably because he can't query the vms.

I'm also sending you the xml contained in the "body" column of the vm_pool table of a vm I can't query with onevm show :

<VM>
   <ID>324</ID>
   <UID>0</UID>
   <GID>0</GID>
   <UNAME>oneadmin</UNAME>
   <GNAME>oneadmin</GNAME>
   <NAME>[...]</NAME>
   <PERMISSIONS>
      <OWNER_U>1</OWNER_U>
      <OWNER_M>1</OWNER_M>
      <OWNER_A>0</OWNER_A>
      <GROUP_U>0</GROUP_U>
      <GROUP_M>0</GROUP_M>
      <GROUP_A>0</GROUP_A>
      <OTHER_U>0</OTHER_U>
      <OTHER_M>0</OTHER_M>
      <OTHER_A>0</OTHER_A>
   </PERMISSIONS>
   <LAST_POLL>1375778872</LAST_POLL>
   <STATE>3</STATE>
   <LCM_STATE>3</LCM_STATE>
   <RESCHED>0</RESCHED>
   <STIME>1375457045</STIME>
   <ETIME>0</ETIME>
   <DEPLOY_ID>one-324</DEPLOY_ID>
   <MEMORY>4194304</MEMORY>
   <CPU>9</CPU>
   <NET_TX>432290511</NET_TX>
   <NET_RX>2072231827</NET_RX>
   <TEMPLATE>
      <CONTEXT>
         <ETH0_DNS><![CDATA[[...]]]></ETH0_DNS>
<ETH0_GATEWAY><![CDATA[[...]]]></ETH0_GATEWAY>
         <ETH0_IP><![CDATA[[...]]]></ETH0_IP>
<ETH0_MASK><![CDATA[[...]]]></ETH0_MASK>
         <FILES><![CDATA[[...]]]></FILES>
         <HOSTNAME><![CDATA[[...]]]></HOSTNAME>
         <TARGET><![CDATA[hdb]]></TARGET>
      </CONTEXT>
      <CPU><![CDATA[4]]></CPU>
      <DISK>
         <CLONE><![CDATA[YES]]></CLONE>
<CLUSTER_ID><![CDATA[101]]></CLUSTER_ID>
<DATASTORE><![CDATA[nonshared_ds]]></DATASTORE>
<DATASTORE_ID><![CDATA[101]]></DATASTORE_ID>
         <DEV_PREFIX><![CDATA[hd]]></DEV_PREFIX>
         <DISK_ID><![CDATA[0]]></DISK_ID>
         <IMAGE><![CDATA[[...]]]></IMAGE>
         <IMAGE_ID><![CDATA[119]]></IMAGE_ID>
<IMAGE_UNAME><![CDATA[oneadmin]]></IMAGE_UNAME>
         <READONLY><![CDATA[NO]]></READONLY>
         <SAVE><![CDATA[NO]]></SAVE>
<SOURCE><![CDATA[/var/lib/one/datastores/101/3860dfcd1bec39ce672ba855564b44ca]]></SOURCE>
         <TARGET><![CDATA[hda]]></TARGET>
         <TM_MAD><![CDATA[ssh]]></TM_MAD>
         <TYPE><![CDATA[FILE]]></TYPE>
      </DISK>
      <DISK>
         <DEV_PREFIX><![CDATA[hd]]></DEV_PREFIX>
         <DISK_ID><![CDATA[1]]></DISK_ID>
         <FORMAT><![CDATA[ext3]]></FORMAT>
         <SIZE><![CDATA[26000]]></SIZE>
         <TARGET><![CDATA[hdc]]></TARGET>
         <TYPE><![CDATA[fs]]></TYPE>
      </DISK>
      <DISK>
         <DEV_PREFIX><![CDATA[hd]]></DEV_PREFIX>
         <DISK_ID><![CDATA[2]]></DISK_ID>
         <SIZE><![CDATA[8192]]></SIZE>
         <TARGET><![CDATA[hdd]]></TARGET>
         <TYPE><![CDATA[swap]]></TYPE>
      </DISK>
      <FEATURES>
         <ACPI><![CDATA[yes]]></ACPI>
      </FEATURES>
      <GRAPHICS>
         <KEYMAP><![CDATA[it]]></KEYMAP>
         <LISTEN><![CDATA[0.0.0.0]]></LISTEN>
         <PORT><![CDATA[6224]]></PORT>
         <TYPE><![CDATA[vnc]]></TYPE>
      </GRAPHICS>
      <MEMORY><![CDATA[4096]]></MEMORY>
      <NAME><![CDATA[[...]]]></NAME>
      <NIC>
         <BRIDGE><![CDATA[br1]]></BRIDGE>
<CLUSTER_ID><![CDATA[101]]></CLUSTER_ID>
         <IP><![CDATA[[...]]]></IP>
<MAC><![CDATA[02:00:c0:a8:1e:02]]></MAC>
         <MODEL><![CDATA[virtio]]></MODEL>
         <NETWORK><![CDATA[[...]]]></NETWORK>
         <NETWORK_ID><![CDATA[9]]></NETWORK_ID>
<NETWORK_UNAME><![CDATA[oneadmin]]></NETWORK_UNAME>
         <VLAN><![CDATA[NO]]></VLAN>
      </NIC>
      <OS>
         <ARCH><![CDATA[x86_64]]></ARCH>
         <BOOT><![CDATA[hd]]></BOOT>
      </OS>
      <RAW>
         <TYPE><![CDATA[kvm]]></TYPE>
      </RAW>
      <REQUIREMENTS><![CDATA[CLUSTER_ID = 101]]></REQUIREMENTS>
      <TEMPLATE_ID><![CDATA[38]]></TEMPLATE_ID>
      <VCPU><![CDATA[4]]></VCPU>
      <VMID><![CDATA[324]]></VMID>
   </TEMPLATE>
   <HISTORY_RECORDS>
      <HISTORY>
         <OID>324</OID>
         <SEQ>0</SEQ>
         <HOSTNAME>[...]</HOSTNAME>
         <HID>15</HID>
         <STIME>1375457063</STIME>
         <ETIME>0</ETIME>
         <VMMMAD>vmm_kvm</VMMMAD>
         <VNMMAD>dummy</VNMMAD>
         <TMMAD>ssh</TMMAD>
         <DS_LOCATION>/var/datastore</DS_LOCATION>
         <DS_ID>102</DS_ID>
         <PSTIME>1375457063</PSTIME>
         <PETIME>1375457263</PETIME>
         <RSTIME>1375457263</RSTIME>
         <RETIME>0</RETIME>
         <ESTIME>0</ESTIME>
         <EETIME>0</EETIME>
         <REASON>0</REASON>
      </HISTORY>
   </HISTORY_RECORDS>
</VM>

I think it'd be of a great help for me to have the update XSD files for all the body columns in the databases: I'd able to validate the xml structure of all the tables to highlight migration problems.

Thanks! :)

F.


Il 21/08/2013 12:13, Carlos Martín Sánchez ha scritto:
Hi,

Could you send us the xml of some of the failing vms and hosts? You can get it with the -x flag in onevm/host list.

Send them off-list if you prefer.

Regards

--
Join us at OpenNebulaConf2013 <http://opennebulaconf.com> in Berlin, 24-26 September, 2013
--
Carlos Martín, MSc
Project Engineer
OpenNebula - The Open-source Solution for Data Center Virtualization
www.OpenNebula.org <http://www.OpenNebula.org> | [email protected] <mailto:[email protected]> | @OpenNebula <http://twitter.com/opennebula>


On Thu, Aug 8, 2013 at 11:29 AM, Federico Zani <[email protected] <mailto:[email protected]>> wrote:

    Hi,
      I am experiencing some issues after the update from 3.7 to 4.2
    (frontend on a CentOS 6.4 and hosts with KVM virt manager), this
    is what I did :

     - Stopped one and sunstone and backed up /etc/one
     - yum localinstall opennebula-4.2.0-1.x86_64.rpm
    opennebula-java-4.2.0-1.x86_64.rpm
    opennebula-ruby-4.2.0-1.x86_64.rpm
    opennebula-server-4.2.0-1.x86_64.rpm
    opennebula-sunstone-4.2.0-1.x86_64.rpm
     - duplicated im and vmm for kvm mads as specified here
    http://opennebula.org/documentation:archives:rel4.0:upgrade#driver_names

     - checked for other mismatch in one.conf but actually I found
    nothing to be fixed
     - onedb upgrade -v --sqlite /var/lib/one/one.db (no errors, just
    a few warning about manual fixes needed - that I did)
     - moved vm description files from //var/lib/one//[0-9]* to
    //var/lib/one/vms//

    Then I tried to fsck the sqlite db but got the following error :
    --------------
    onedb fsck -f -v -s /var/lib/one/one.db
    Version read:
    4.2.0 : Database migrated from 3.7.80 to 4.2.0 (OpenNebula 4.2.0)
    by onedb command.

    Sqlite database backup stored in /var/lib/one/one.db.bck
    Use 'onedb restore' or copy the file back to restore the DB.
      > Running fsck

    Datastore 0 is missing fom Cluster 101 datastore id list
    Image 127 is missing fom Datastore 101 image id list
    undefined method `elements' for nil:NilClass
    Error running fsck version 4.2.0
    The database will be restored
    Sqlite database backup restored in /var/lib/one/one.db
    -----------

    I also tried to reinstall ruby gems with
    /usr/share/one/install_gems but still got the same issue.

    After a few searching, I  tried to start one and sunstone-server
    anyway, and this is the result :
     - I can do "onevm list" and "onehost list" correctly
     - When I do a "onevm show" on a terminated vm it shows me the
    correct information
     - When I do a "onevm show" (on a running vm) or "onehost show",
    it returns a "[VirtualMachineInfo] Error getting virtual machine
    [312]." or either "[HostInfo] Error getting host [30]."

    In the log file (/var/log/oned.log) I can see the following
    errors, when issuing those commands :
    ----------
    Tue Aug  6 12:49:40 2013 [ONE][E]: SQL command was: SELECT body
    FROM host_pool WHERE oid = 30, error: callback requested query abort
    Tue Aug  6 12:49:40 2013 [ONE][E]: SQL command was: SELECT body
    FROM vm_pool WHERE oid = 312, error: callback requested query abort
    ------------

    I am still able to see datastores informations and the overall
    situation of my private cloud through the sunstone dashboard, but
    it seems I cannot access informations related to running vms and
    hosts: it leads to an unusable private cloud (can't stop vms,
    can't run a new one, etc...)

    Any clues ?

    Federico.

    _______________________________________________
    Users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.opennebula.org/listinfo.cgi/users-opennebula.org



_______________________________________________
Users mailing list
[email protected]
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Reply via email to