Hello

On 26.09.2011 13:44, Carlos Martín Sánchez wrote:
The host_shares contains the "running_vms" column; you need to update that
column value with OpenNebula stopped.

We are still trying to figure out what causes this bug, so if you come
across it again, it would be great if you could write down the operations
that led to it.

I do not know if this is related or not, but I guess it could be an indication.

I am running OpenNebula 2.2.1 with MySQL database. I did just restart mysqld and now all the one* commands report errors like this:

# onevm list
[VirtualMachinePoolInfo] Error getting VM Pool.

In oned.log I see the following messages (regarding the 'onevm list' command):

Tue Sep 27 13:47:20 2011 [ReM][D]: VirtualMachinePoolInfo method invoked Tue Sep 27 13:47:20 2011 [ONE][E]: SQL command was: SELECT vm_pool.oid, vm_pool.uid, vm_pool.name, vm_pool.last_poll, vm_pool.state, vm_pool.lcm_state, vm_pool.stime, vm_pool.etime, vm_pool.deploy_id, vm_pool.memory, vm_pool.cpu, vm_pool.net_tx, vm_pool.net_rx, vm_pool.last_seq, vm_pool.template, user_pool.user_name, history.vid, history.seq, history.host_name, history.vm_dir, history.hid, history.vm_mad, history.tm_mad, history.stime, history.etime, history.pstime, history.petime, history.rstime, history.retime, history.estime, history.eetime, history.reason FROM vm_pool LEFT OUTER JOIN history ON vm_pool.oid = history.vid AND history.seq = vm_pool.last_seq LEFT OUTER JOIN (SELECT oid,user_name FROM user_pool) AS user_pool ON vm_pool.uid = user_pool.oid WHERE vm_pool.state <> 6, error 2006 : MySQL server has gone away Tue Sep 27 13:47:20 2011 [ReM][E]: [VirtualMachinePoolInfo] Error getting VM Pool.

And some other general messages, probably from monitoring:

Tue Sep 27 13:47:13 2011 [ONE][E]: SQL command was: SELECT oid, im_mad FROM host_pool WHERE state != 4 ORDER BY last_mon_time ASC LIMIT 15, error 2006 : MySQL server has gone away Tue Sep 27 13:47:13 2011 [ONE][E]: SQL command was: SELECT oid FROM vm_pool WHERE last_poll <= 1317130633 and state = 3 and ( lcm_state = 3 or lcm_state = 16 ) ORDER BY last_poll ASC LIMIT 5, error 2006 : MySQL server has gone away

For some reason oned does not re-connect to the MySQL server. I do not know how this is implemented (or if this is something which depends on my system), but I think if the mysql library is used, the reconnect should be automatically and transparently. A still running mysql client after the restart of mysqld does handle this just fine and transparently (with just an informational message):

mysql> show databases;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    1
Current database: *** NONE ***

+--------------------+
| Database           |
+--------------------+
| information_schema |


After also restarting OpenNebula (oned, scheduler), everything seems to work fine again. But I guess, if for some reason mysqld is down (or is going done) at the wrong moment, the database could not have saved all the needed information. Eg. in the moment when scheduler is deploying a VM to a cluster node. Could something like this cause the reporting errors Steve is seeing?


bye
Fabian
_______________________________________________
Users mailing list
[email protected]
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Reply via email to