Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-21 Thread Javier Fontan
It seems that there are more people having this problem and we are taking a look on several ways to fix this. One problem with /var/run is that it is normally owned by root and a process started by oneadmin user can not write there. In the frontend a new directory for OpenNebula pid files is

Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-21 Thread Daniel Dehennin
Javier Fontan jfon...@opennebula.org writes: It seems that there are more people having this problem and we are taking a look on several ways to fix this. One problem with /var/run is that it is normally owned by root and a process started by oneadmin user can not write there. In the frontend

Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-21 Thread Gerry O'Brien
Hi, I've gotten down to only one collestd-client.rb process (see below). Are the multiple kvm-probes OK? Regards, Gerry root@host101:~# ps -ef | grep one oneadmin 3349 1 0 12:23 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm

Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-20 Thread Gerry O'Brien
Hi Ruben, Below is the output of 'ps -ef | grep one' on a host that has been disabled, rebooted and enabled. There are multiple versions of collectd-client.rb kvm running. We have discovered today a serious issue that is having an adverse effect on our DNS system. When the

Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-20 Thread Javier Fontan
The problem seems to be the high amount of collectd processes running. Try killing all collectd-client.rb processes. There should be only one running per host. In case you want to use the old method of monitoring you can follow this guide:

Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN - Possibly solved

2014-01-20 Thread Gerry O'Brien
Hi, I think we've figured out the cause of the issues reported above and they are particular to our installation. All our hosts use an NFS mounted root partition. The reasons for using this approach are historical and were supposed to make it easier to keep the hosts equally

Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-20 Thread Javier Fontan
I've been trying to reproduce the problem, that is, making OpenNebula start a high amount of collectd-client processes. The only way I was able to do it is when the file /tmp/one-collectd-client.pid exists and has wrong permissions. Can you check the ownership and permissions of that file? On

Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-20 Thread Gerry O'Brien
Hi Javier, See my previous email. Another scenario is when /tmp/one-collectd-client.pid does not exist due to issues with /tmp. A change seems to have been made to put a pid file in /tmp instead of /run or /var/run. Regards, Gerry On 20/01/2014 17:44, Javier Fontan

Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-17 Thread Liu, Guang Jun (Gene)
I would like to input -- We use ONE4.4 (final) and see this UNKNOWN stat for some of the VMs as well. Thanks, Gene On Fri 17 Jan 2014 12:45:47 PM EST, Ruben S. Montero wrote: Hi Gerry Just to check, are you using 4.4 Final? We've seen this in the betas and thought we fixed for the final

Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-17 Thread Ruben S. Montero
OK, thanks Filled an issue for this http://dev.opennebula.org/issues/2656 We'll try to reproduce it also in our infrastructure. Cheers Ruben On Fri, Jan 17, 2014 at 7:00 PM, Liu, Guang Jun (Gene) gene@alcatel-lucent.com wrote: I would like to input -- We use ONE4.4 (final) and see