Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-21 Thread Javier Fontan
It seems that there are more people having this problem and we are
taking a look on several ways to fix this. One problem with /var/run
is that it is normally owned by root and a process started by oneadmin
user can not write there. In the frontend a new directory for
OpenNebula pid files is created but in the nodes it does not exist.

On Tue, Jan 21, 2014 at 8:07 AM, Gerry O'Brien ge...@scss.tcd.ie wrote:
 Hi Javier,

   See my previous email. Another scenario is when
 /tmp/one-collectd-client.pid does not exist due to issues with /tmp.

A change seems to have been made to put a pid file in /tmp instead of
 /run or /var/run.

 Regards,
   Gerry



 On 20/01/2014 17:44, Javier Fontan wrote:

 I've been trying to reproduce the problem, that is, making OpenNebula
 start a high amount of collectd-client processes. The only way I was
 able to do it is when the file /tmp/one-collectd-client.pid exists
 and has wrong permissions. Can you check the ownership and permissions
 of that file?

 On Mon, Jan 20, 2014 at 4:15 PM, Javier Fontan jfon...@opennebula.org
 wrote:

 The problem seems to be the high amount of collectd processes running.
 Try killing all collectd-client.rb processes. There should be only
 one running per host.

 In case you want to use the old method of monitoring you can follow this
 guide:


 http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg

 On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien ge...@scss.tcd.ie wrote:

 Hi Ruben,

  Below is the output of 'ps -ef | grep one' on a host that has been
 disabled, rebooted and enabled. There are multiple versions of
 collectd-client.rb kvm running.


  We have discovered today a serious issue that is having an adverse
 effect on our DNS system. When the machines below was enabled,
 immediately
 our DNS server is flooded with requests from the host (see a sample
 below).
   Our logs show that this has only started happening since the
 upgrade to
 4.4. If we don't get a fix for this we will have to go back to 4.2,
 which is
 something I really don't want to do.

  Regards,
  Gerry




 oneadmin  3628 1  0 13:04 ?00:00:00 ruby
 /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores
 4124
 20 0 host101.scss.tcd.ie
 oneadmin  4600 1  0 13:05 ?00:00:00 ruby
 /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores
 4124
 20 0 host101.scss.tcd.ie
 oneadmin  6400 1  0 13:07 ?00:00:00 ruby
 /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores
 4124
 20 0 host101.scss.tcd.ie
 oneadmin  9003 1  0 13:08 ?00:00:00 ruby
 /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores
 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12953  3628  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores
 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12955  6400  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores
 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12969 12953  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores
 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12970 12969  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores
 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12972 12955  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores
 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12973 12972  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores
 4124
 20 0 host101.scss.tcd.ie
 oneadmin 13029 12973  0 13:10 ?00:00:00 /bin/bash
 ./monitor_ds.sh
 kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie
 oneadmin 13030 12970  0 13:10 ?00:00:00 /bin/bash
 ./monitor_ds.sh
 kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie



 -2014 13:14:26.675 client 134.226.59.101#52314: query:
 host101.scss.tcd.ie
 IN  + (134.226.32.57)
 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
 

Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-21 Thread Daniel Dehennin
Javier Fontan jfon...@opennebula.org writes:

 It seems that there are more people having this problem and we are
 taking a look on several ways to fix this. One problem with /var/run
 is that it is normally owned by root and a process started by oneadmin
 user can not write there. In the frontend a new directory for
 OpenNebula pid files is created but in the nodes it does not exist.

Hello,

What do you think about a “ONE node setup init script”?

On my debian systems I have opennebula-common and opennebula-node on
each nodes.

The -node could include an init script to setup an opennebula directory
in /var/run[1] with proper owner and permissions.

My 2¢.

Regards.

Footnotes: 
[1]  /run now on debian system

-- 
Daniel Dehennin
Récupérer ma clef GPG:
gpg --keyserver pgp.mit.edu --recv-keys 0x7A6FE2DF


pgp4QS4lYcl1v.pgp
Description: PGP signature
___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-21 Thread Gerry O'Brien

Hi,

I've gotten down to only one collestd-client.rb process (see 
below). Are the multiple kvm-probes OK?


Regards,
  Gerry




root@host101:~# ps -ef | grep one
oneadmin  3349 1  0 12:23 ?00:00:00 ruby 
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 21068  3349  0 12:51 ?00:00:00 /bin/bash 
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 21076 21068  0 12:51 ?00:00:00 /bin/bash 
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 21077 21076  0 12:51 ?00:00:00 /bin/bash 
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie





On 21/01/2014 10:10, Javier Fontan wrote:

It seems that there are more people having this problem and we are
taking a look on several ways to fix this. One problem with /var/run
is that it is normally owned by root and a process started by oneadmin
user can not write there. In the frontend a new directory for
OpenNebula pid files is created but in the nodes it does not exist.

On Tue, Jan 21, 2014 at 8:07 AM, Gerry O'Brien ge...@scss.tcd.ie wrote:

Hi Javier,

   See my previous email. Another scenario is when
/tmp/one-collectd-client.pid does not exist due to issues with /tmp.

A change seems to have been made to put a pid file in /tmp instead of
/run or /var/run.

 Regards,
   Gerry



On 20/01/2014 17:44, Javier Fontan wrote:

I've been trying to reproduce the problem, that is, making OpenNebula
start a high amount of collectd-client processes. The only way I was
able to do it is when the file /tmp/one-collectd-client.pid exists
and has wrong permissions. Can you check the ownership and permissions
of that file?

On Mon, Jan 20, 2014 at 4:15 PM, Javier Fontan jfon...@opennebula.org
wrote:

The problem seems to be the high amount of collectd processes running.
Try killing all collectd-client.rb processes. There should be only
one running per host.

In case you want to use the old method of monitoring you can follow this
guide:


http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg

On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien ge...@scss.tcd.ie wrote:

Hi Ruben,

  Below is the output of 'ps -ef | grep one' on a host that has been
disabled, rebooted and enabled. There are multiple versions of
collectd-client.rb kvm running.


  We have discovered today a serious issue that is having an adverse
effect on our DNS system. When the machines below was enabled,
immediately
our DNS server is flooded with requests from the host (see a sample
below).
   Our logs show that this has only started happening since the
upgrade to
4.4. If we don't get a fix for this we will have to go back to 4.2,
which is
something I really don't want to do.

  Regards,
  Gerry




oneadmin  3628 1  0 13:04 ?00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores
4124
20 0 host101.scss.tcd.ie
oneadmin  4600 1  0 13:05 ?00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores
4124
20 0 host101.scss.tcd.ie
oneadmin  6400 1  0 13:07 ?00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores
4124
20 0 host101.scss.tcd.ie
oneadmin  9003 1  0 13:08 ?00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores
4124
20 0 host101.scss.tcd.ie
oneadmin 12953  3628  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores
4124
20 0 host101.scss.tcd.ie
oneadmin 12955  6400  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores
4124
20 0 host101.scss.tcd.ie
oneadmin 12969 12953  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores
4124
20 0 host101.scss.tcd.ie
oneadmin 12970 12969  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores
4124
20 0 host101.scss.tcd.ie
oneadmin 12972 12955  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores
4124
20 0 host101.scss.tcd.ie
oneadmin 12973 12972  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores
4124
20 0 host101.scss.tcd.ie
oneadmin 13029 12973  0 13:10 ?00:00:00 /bin/bash
./monitor_ds.sh
kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie
oneadmin 13030 12970  0 13:10 ?00:00:00 /bin/bash
./monitor_ds.sh
kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie



-2014 13:14:26.675 client 134.226.59.101#52314: query:
host101.scss.tcd.ie
IN  + (134.226.32.57)
20-Jan-2014 13:14:26.680 client 

Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-20 Thread Gerry O'Brien

Hi Ruben,

Below is the output of 'ps -ef | grep one' on a host that has been 
disabled, rebooted and enabled. There are multiple versions of  
collectd-client.rb kvm running.



We have discovered today a serious issue that is having an adverse 
effect on our DNS system. When the machines below was enabled, 
immediately our DNS server is flooded with requests from the host (see a 
sample below).
 Our logs show that this has only started happening since the 
upgrade to 4.4. If we don't get a fix for this we will have to go back 
to 4.2, which is something I really don't want to do.


Regards,
Gerry




oneadmin  3628 1  0 13:04 ?00:00:00 ruby 
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin  4600 1  0 13:05 ?00:00:00 ruby 
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin  6400 1  0 13:07 ?00:00:00 ruby 
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin  9003 1  0 13:08 ?00:00:00 ruby 
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 12953  3628  0 13:10 ?00:00:00 /bin/bash 
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 12955  6400  0 13:10 ?00:00:00 /bin/bash 
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 12969 12953  0 13:10 ?00:00:00 /bin/bash 
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 12970 12969  0 13:10 ?00:00:00 /bin/bash 
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 12972 12955  0 13:10 ?00:00:00 /bin/bash 
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 12973 12972  0 13:10 ?00:00:00 /bin/bash 
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 
4124 20 0 host101.scss.tcd.ie
oneadmin 13029 12973  0 13:10 ?00:00:00 /bin/bash 
./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 
host101.scss.tcd.ie
oneadmin 13030 12970  0 13:10 ?00:00:00 /bin/bash 
./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 
host101.scss.tcd.ie




-2014 13:14:26.675 client 134.226.59.101#52314: query: 
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: 
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: 
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query: 
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: 
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:26.952 client 134.226.59.101#53975: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.953 client 134.226.59.101#53975: query: 
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query: 
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query: 
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:27.347 client 134.226.59.101#49614: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.348 client 134.226.59.101#49614: query: 
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:27.350 client 134.226.59.101#44058: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.357 client 134.226.59.101#44058: query: 
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:27.458 client 134.226.59.101#51830: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.458 client 134.226.59.101#51830: query: 
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:27.461 client 134.226.59.101#38419: query: 
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.461 client 134.226.59.101#38419: query: 
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:31.184 

Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-20 Thread Javier Fontan
The problem seems to be the high amount of collectd processes running.
Try killing all collectd-client.rb processes. There should be only
one running per host.

In case you want to use the old method of monitoring you can follow this guide:

http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg

On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien ge...@scss.tcd.ie wrote:
 Hi Ruben,

 Below is the output of 'ps -ef | grep one' on a host that has been
 disabled, rebooted and enabled. There are multiple versions of
 collectd-client.rb kvm running.


 We have discovered today a serious issue that is having an adverse
 effect on our DNS system. When the machines below was enabled, immediately
 our DNS server is flooded with requests from the host (see a sample below).
  Our logs show that this has only started happening since the upgrade to
 4.4. If we don't get a fix for this we will have to go back to 4.2, which is
 something I really don't want to do.

 Regards,
 Gerry




 oneadmin  3628 1  0 13:04 ?00:00:00 ruby
 /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin  4600 1  0 13:05 ?00:00:00 ruby
 /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin  6400 1  0 13:07 ?00:00:00 ruby
 /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin  9003 1  0 13:08 ?00:00:00 ruby
 /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12953  3628  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12955  6400  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12969 12953  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12970 12969  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12972 12955  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12973 12972  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin 13029 12973  0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh
 kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie
 oneadmin 13030 12970  0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh
 kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie



 -2014 13:14:26.675 client 134.226.59.101#52314: query: host101.scss.tcd.ie
 IN  + (134.226.32.57)
 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:26.952 client 134.226.59.101#53975: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:26.953 client 134.226.59.101#53975: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:27.347 client 134.226.59.101#49614: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:27.348 client 134.226.59.101#49614: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:27.350 client 134.226.59.101#44058: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:27.357 client 134.226.59.101#44058: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:27.458 

Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN - Possibly solved

2014-01-20 Thread Gerry O'Brien

Hi,

I think we've figured out the cause of the issues reported above 
and they are particular to our installation.


All our hosts use an NFS mounted root partition. The reasons for 
using this approach are historical and were supposed to make it easier 
to keep the hosts equally up-to-date.
   The issue here was that /tmp was the same for every host which 
caused collectd-client_control.sh to run multiple instances of 
collectd-client.rb as it writes its PID in /tmp and 
collectd-client_control.sh couldn't find the PID of the already running 
collectd-client.rb.


My guess is that the DNS issue is related to the explicit use of 
the hostname  in ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm 
/var/lib/one//datastores 4124 20 3 host104.scss.tcd.ie. This seems to 
have changed since 4.2.
The multiple copies of collectd-client.rb only exacerbated the 
problem. As we have a single hosts file for every host the solution was 
to place DNS entries for all hosts in /etc/hosts


Regards,
  Gerry


On 20/01/2014 15:15, Javier Fontan wrote:

The problem seems to be the high amount of collectd processes running.
Try killing all collectd-client.rb processes. There should be only
one running per host.

In case you want to use the old method of monitoring you can follow this guide:

http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg

On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien ge...@scss.tcd.ie wrote:

Hi Ruben,

 Below is the output of 'ps -ef | grep one' on a host that has been
disabled, rebooted and enabled. There are multiple versions of
collectd-client.rb kvm running.


 We have discovered today a serious issue that is having an adverse
effect on our DNS system. When the machines below was enabled, immediately
our DNS server is flooded with requests from the host (see a sample below).
  Our logs show that this has only started happening since the upgrade to
4.4. If we don't get a fix for this we will have to go back to 4.2, which is
something I really don't want to do.

 Regards,
 Gerry




oneadmin  3628 1  0 13:04 ?00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin  4600 1  0 13:05 ?00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin  6400 1  0 13:07 ?00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin  9003 1  0 13:08 ?00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12953  3628  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12955  6400  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12969 12953  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12970 12969  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12972 12955  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12973 12972  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 13029 12973  0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh
kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie
oneadmin 13030 12970  0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh
kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie



-2014 13:14:26.675 client 134.226.59.101#52314: query: host101.scss.tcd.ie
IN  + (134.226.32.57)
20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query:
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:26.952 client 134.226.59.101#53975: query:

Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-20 Thread Javier Fontan
I've been trying to reproduce the problem, that is, making OpenNebula
start a high amount of collectd-client processes. The only way I was
able to do it is when the file /tmp/one-collectd-client.pid exists
and has wrong permissions. Can you check the ownership and permissions
of that file?

On Mon, Jan 20, 2014 at 4:15 PM, Javier Fontan jfon...@opennebula.org wrote:
 The problem seems to be the high amount of collectd processes running.
 Try killing all collectd-client.rb processes. There should be only
 one running per host.

 In case you want to use the old method of monitoring you can follow this 
 guide:

 http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg

 On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien ge...@scss.tcd.ie wrote:
 Hi Ruben,

 Below is the output of 'ps -ef | grep one' on a host that has been
 disabled, rebooted and enabled. There are multiple versions of
 collectd-client.rb kvm running.


 We have discovered today a serious issue that is having an adverse
 effect on our DNS system. When the machines below was enabled, immediately
 our DNS server is flooded with requests from the host (see a sample below).
  Our logs show that this has only started happening since the upgrade to
 4.4. If we don't get a fix for this we will have to go back to 4.2, which is
 something I really don't want to do.

 Regards,
 Gerry




 oneadmin  3628 1  0 13:04 ?00:00:00 ruby
 /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin  4600 1  0 13:05 ?00:00:00 ruby
 /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin  6400 1  0 13:07 ?00:00:00 ruby
 /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin  9003 1  0 13:08 ?00:00:00 ruby
 /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12953  3628  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12955  6400  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12969 12953  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12970 12969  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12972 12955  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin 12973 12972  0 13:10 ?00:00:00 /bin/bash
 /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
 20 0 host101.scss.tcd.ie
 oneadmin 13029 12973  0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh
 kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie
 oneadmin 13030 12970  0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh
 kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie



 -2014 13:14:26.675 client 134.226.59.101#52314: query: host101.scss.tcd.ie
 IN  + (134.226.32.57)
 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:26.952 client 134.226.59.101#53975: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:26.953 client 134.226.59.101#53975: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query:
 host101.scss.tcd.ie IN A + (134.226.32.57)
 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query:
 host101.scss.tcd.ie IN  + (134.226.32.57)
 20-Jan-2014 13:14:27.347 client 134.226.59.101#49614: query:
 

Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-20 Thread Gerry O'Brien

Hi Javier,

  See my previous email. Another scenario is when 
/tmp/one-collectd-client.pid does not exist due to issues with /tmp.


   A change seems to have been made to put a pid file in /tmp instead 
of /run or /var/run.


Regards,
  Gerry


On 20/01/2014 17:44, Javier Fontan wrote:

I've been trying to reproduce the problem, that is, making OpenNebula
start a high amount of collectd-client processes. The only way I was
able to do it is when the file /tmp/one-collectd-client.pid exists
and has wrong permissions. Can you check the ownership and permissions
of that file?

On Mon, Jan 20, 2014 at 4:15 PM, Javier Fontan jfon...@opennebula.org wrote:

The problem seems to be the high amount of collectd processes running.
Try killing all collectd-client.rb processes. There should be only
one running per host.

In case you want to use the old method of monitoring you can follow this guide:

http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg

On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien ge...@scss.tcd.ie wrote:

Hi Ruben,

 Below is the output of 'ps -ef | grep one' on a host that has been
disabled, rebooted and enabled. There are multiple versions of
collectd-client.rb kvm running.


 We have discovered today a serious issue that is having an adverse
effect on our DNS system. When the machines below was enabled, immediately
our DNS server is flooded with requests from the host (see a sample below).
  Our logs show that this has only started happening since the upgrade to
4.4. If we don't get a fix for this we will have to go back to 4.2, which is
something I really don't want to do.

 Regards,
 Gerry




oneadmin  3628 1  0 13:04 ?00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin  4600 1  0 13:05 ?00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin  6400 1  0 13:07 ?00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin  9003 1  0 13:08 ?00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12953  3628  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12955  6400  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12969 12953  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12970 12969  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12972 12955  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12973 12972  0 13:10 ?00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 13029 12973  0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh
kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie
oneadmin 13030 12970  0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh
kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie



-2014 13:14:26.675 client 134.226.59.101#52314: query: host101.scss.tcd.ie
IN  + (134.226.32.57)
20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query:
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:26.952 client 134.226.59.101#53975: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.953 client 134.226.59.101#53975: query:
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query:
host101.scss.tcd.ie IN  + (134.226.32.57)
20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: 

[one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-17 Thread Gerry O'Brien

Hi,

Below is a truncated log file for a VM. The monitor continually 
cycles through finding the machine RUNNING and stat UNKNOWN. This occurs 
for many many machines at the same time. All machines were created by a 
script.


The VMs are Microsoft Windows 7 64bit Enterprise. Individual 
context is created by a startup script. They run fine but eventually 
/var/log/one is going overflow.


Restarting oned seems to fix the problem but this is hardly a long 
term solution.


Any suggestions on what could be causing this?

Regards,
Gerry




Thu Jan 16 16:56:21 2014 [DiM][I]: New VM state is ACTIVE.
Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is PROLOG.
Thu Jan 16 16:56:22 2014 [VM][I]: Virtual Machine has no context
Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is BOOT
Thu Jan 16 16:56:22 2014 [VMM][I]: Generating deployment file: 
/var/lib/one/vms/1788/deployment.0

Thu Jan 16 16:56:23 2014 [VMM][I]: ExitCode: 0
Thu Jan 16 16:56:23 2014 [VMM][I]: Successfully execute network driver 
operation: pre.

Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0
Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute virtualization 
driver operation: deploy.

Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0
Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute network driver 
operation: post.

Thu Jan 16 16:56:25 2014 [LCM][I]: New VM state is RUNNING
Thu Jan 16 16:56:51 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 16:59:01 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 16:59:23 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:01:41 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:01:58 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:04:18 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:04:39 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:06:55 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:07:06 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:09:31 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:09:31 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:12:22 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:12:27 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:15:11 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:15:22 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:17:49 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:18:00 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:20:27 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:20:34 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:23:04 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:23:08 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:25:41 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:25:43 2014 [LCM][I]: New VM state is UNKNOWN

--
Gerry O'Brien

Systems Manager
School of Computer Science and Statistics
Trinity College Dublin
Dublin 2
IRELAND

00 353 1 896 1341

___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-17 Thread Liu, Guang Jun (Gene)
I would like to input --
We use ONE4.4 (final) and see this UNKNOWN stat for some of the VMs 
as well.

Thanks,

Gene

On Fri 17 Jan 2014 12:45:47 PM EST, Ruben S. Montero wrote:
 Hi Gerry

 Just to check, are you using 4.4 Final? We've seen this in the betas
 and thought we fixed for the final version. Also could you check
 that there are just one monitorization process at the hosts
 (collectd-client.sh, or equiv should be the name of the process)

 Also could you send us the lines from oned.log between Thu Jan 16
 16:56:25 2014 and Thu Jan 16 17:25:43 2014; plus the first lines that
 includes you oned.conf values (we are interested specially in those
 related to monitoring interval)


 Cheers

 Ruben




 On Fri, Jan 17, 2014 at 2:27 PM, Gerry O'Brien ge...@scss.tcd.ie
 mailto:ge...@scss.tcd.ie wrote:

 Hi,

 Below is a truncated log file for a VM. The monitor
 continually cycles through finding the machine RUNNING and stat
 UNKNOWN. This occurs for many many machines at the same time. All
 machines were created by a script.

 The VMs are Microsoft Windows 7 64bit Enterprise. Individual
 context is created by a startup script. They run fine but
 eventually /var/log/one is going overflow.

 Restarting oned seems to fix the problem but this is hardly a
 long term solution.

 Any suggestions on what could be causing this?

 Regards,
 Gerry




 Thu Jan 16 16:56:21 2014 [DiM][I]: New VM state is ACTIVE.
 Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is PROLOG.
 Thu Jan 16 16:56:22 2014 [VM][I]: Virtual Machine has no context
 Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is BOOT
 Thu Jan 16 16:56:22 2014 [VMM][I]: Generating deployment file:
 /var/lib/one/vms/1788/__deployment.0
 Thu Jan 16 16:56:23 2014 [VMM][I]: ExitCode: 0
 Thu Jan 16 16:56:23 2014 [VMM][I]: Successfully execute network
 driver operation: pre.
 Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0
 Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute
 virtualization driver operation: deploy.
 Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0
 Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute network
 driver operation: post.
 Thu Jan 16 16:56:25 2014 [LCM][I]: New VM state is RUNNING
 Thu Jan 16 16:56:51 2014 [LCM][I]: New VM state is UNKNOWN
 Thu Jan 16 16:59:01 2014 [VMM][I]: VM found again, state is RUNNING
 Thu Jan 16 16:59:23 2014 [LCM][I]: New VM state is UNKNOWN
 Thu Jan 16 17:01:41 2014 [VMM][I]: VM found again, state is RUNNING
 Thu Jan 16 17:01:58 2014 [LCM][I]: New VM state is UNKNOWN
 Thu Jan 16 17:04:18 2014 [VMM][I]: VM found again, state is RUNNING
 Thu Jan 16 17:04:39 2014 [LCM][I]: New VM state is UNKNOWN
 Thu Jan 16 17:06:55 2014 [VMM][I]: VM found again, state is RUNNING
 Thu Jan 16 17:07:06 2014 [LCM][I]: New VM state is UNKNOWN
 Thu Jan 16 17:09:31 2014 [VMM][I]: VM found again, state is RUNNING
 Thu Jan 16 17:09:31 2014 [LCM][I]: New VM state is UNKNOWN
 Thu Jan 16 17:12:22 2014 [VMM][I]: VM found again, state is RUNNING
 Thu Jan 16 17:12:27 2014 [LCM][I]: New VM state is UNKNOWN
 Thu Jan 16 17:15:11 2014 [VMM][I]: VM found again, state is RUNNING
 Thu Jan 16 17:15:22 2014 [LCM][I]: New VM state is UNKNOWN
 Thu Jan 16 17:17:49 2014 [VMM][I]: VM found again, state is RUNNING
 Thu Jan 16 17:18:00 2014 [LCM][I]: New VM state is UNKNOWN
 Thu Jan 16 17:20:27 2014 [VMM][I]: VM found again, state is RUNNING
 Thu Jan 16 17:20:34 2014 [LCM][I]: New VM state is UNKNOWN
 Thu Jan 16 17:23:04 2014 [VMM][I]: VM found again, state is RUNNING
 Thu Jan 16 17:23:08 2014 [LCM][I]: New VM state is UNKNOWN
 Thu Jan 16 17:25:41 2014 [VMM][I]: VM found again, state is RUNNING
 Thu Jan 16 17:25:43 2014 [LCM][I]: New VM state is UNKNOWN

 --
 Gerry O'Brien

 Systems Manager
 School of Computer Science and Statistics
 Trinity College Dublin
 Dublin 2
 IRELAND

 00 353 1 896 1341

 _
 Users mailing list
 Users@lists.opennebula.org mailto:Users@lists.opennebula.org
 http://lists.opennebula.org/__listinfo.cgi/users-opennebula.__org
 http://lists.opennebula.org/listinfo.cgi/users-opennebula.org




 --
 --
 Ruben S. Montero, PhD
 Project co-Lead and Chief Architect
 OpenNebula - Flexible Enterprise Cloud Made Simple
 www.OpenNebula.org http://www.OpenNebula.org |
 rsmont...@opennebula.org mailto:rsmont...@opennebula.org | @OpenNebula


 ___
 Users mailing list
 Users@lists.opennebula.org
 http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN

2014-01-17 Thread Ruben S. Montero
OK, thanks

Filled an issue for this

http://dev.opennebula.org/issues/2656

We'll try to reproduce it also in our infrastructure.

Cheers

Ruben


On Fri, Jan 17, 2014 at 7:00 PM, Liu, Guang Jun (Gene) 
gene@alcatel-lucent.com wrote:

 I would like to input --
 We use ONE4.4 (final) and see this UNKNOWN stat for some of the VMs
 as well.

 Thanks,

 Gene

 On Fri 17 Jan 2014 12:45:47 PM EST, Ruben S. Montero wrote:
  Hi Gerry
 
  Just to check, are you using 4.4 Final? We've seen this in the betas
  and thought we fixed for the final version. Also could you check
  that there are just one monitorization process at the hosts
  (collectd-client.sh, or equiv should be the name of the process)
 
  Also could you send us the lines from oned.log between Thu Jan 16
  16:56:25 2014 and Thu Jan 16 17:25:43 2014; plus the first lines that
  includes you oned.conf values (we are interested specially in those
  related to monitoring interval)
 
 
  Cheers
 
  Ruben
 
 
 
 
  On Fri, Jan 17, 2014 at 2:27 PM, Gerry O'Brien ge...@scss.tcd.ie
  mailto:ge...@scss.tcd.ie wrote:
 
  Hi,
 
  Below is a truncated log file for a VM. The monitor
  continually cycles through finding the machine RUNNING and stat
  UNKNOWN. This occurs for many many machines at the same time. All
  machines were created by a script.
 
  The VMs are Microsoft Windows 7 64bit Enterprise. Individual
  context is created by a startup script. They run fine but
  eventually /var/log/one is going overflow.
 
  Restarting oned seems to fix the problem but this is hardly a
  long term solution.
 
  Any suggestions on what could be causing this?
 
  Regards,
  Gerry
 
 
 
 
  Thu Jan 16 16:56:21 2014 [DiM][I]: New VM state is ACTIVE.
  Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is PROLOG.
  Thu Jan 16 16:56:22 2014 [VM][I]: Virtual Machine has no context
  Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is BOOT
  Thu Jan 16 16:56:22 2014 [VMM][I]: Generating deployment file:
  /var/lib/one/vms/1788/__deployment.0
  Thu Jan 16 16:56:23 2014 [VMM][I]: ExitCode: 0
  Thu Jan 16 16:56:23 2014 [VMM][I]: Successfully execute network
  driver operation: pre.
  Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0
  Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute
  virtualization driver operation: deploy.
  Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0
  Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute network
  driver operation: post.
  Thu Jan 16 16:56:25 2014 [LCM][I]: New VM state is RUNNING
  Thu Jan 16 16:56:51 2014 [LCM][I]: New VM state is UNKNOWN
  Thu Jan 16 16:59:01 2014 [VMM][I]: VM found again, state is RUNNING
  Thu Jan 16 16:59:23 2014 [LCM][I]: New VM state is UNKNOWN
  Thu Jan 16 17:01:41 2014 [VMM][I]: VM found again, state is RUNNING
  Thu Jan 16 17:01:58 2014 [LCM][I]: New VM state is UNKNOWN
  Thu Jan 16 17:04:18 2014 [VMM][I]: VM found again, state is RUNNING
  Thu Jan 16 17:04:39 2014 [LCM][I]: New VM state is UNKNOWN
  Thu Jan 16 17:06:55 2014 [VMM][I]: VM found again, state is RUNNING
  Thu Jan 16 17:07:06 2014 [LCM][I]: New VM state is UNKNOWN
  Thu Jan 16 17:09:31 2014 [VMM][I]: VM found again, state is RUNNING
  Thu Jan 16 17:09:31 2014 [LCM][I]: New VM state is UNKNOWN
  Thu Jan 16 17:12:22 2014 [VMM][I]: VM found again, state is RUNNING
  Thu Jan 16 17:12:27 2014 [LCM][I]: New VM state is UNKNOWN
  Thu Jan 16 17:15:11 2014 [VMM][I]: VM found again, state is RUNNING
  Thu Jan 16 17:15:22 2014 [LCM][I]: New VM state is UNKNOWN
  Thu Jan 16 17:17:49 2014 [VMM][I]: VM found again, state is RUNNING
  Thu Jan 16 17:18:00 2014 [LCM][I]: New VM state is UNKNOWN
  Thu Jan 16 17:20:27 2014 [VMM][I]: VM found again, state is RUNNING
  Thu Jan 16 17:20:34 2014 [LCM][I]: New VM state is UNKNOWN
  Thu Jan 16 17:23:04 2014 [VMM][I]: VM found again, state is RUNNING
  Thu Jan 16 17:23:08 2014 [LCM][I]: New VM state is UNKNOWN
  Thu Jan 16 17:25:41 2014 [VMM][I]: VM found again, state is RUNNING
  Thu Jan 16 17:25:43 2014 [LCM][I]: New VM state is UNKNOWN
 
  --
  Gerry O'Brien
 
  Systems Manager
  School of Computer Science and Statistics
  Trinity College Dublin
  Dublin 2
  IRELAND
 
  00 353 1 896 1341
 
  _
  Users mailing list
  Users@lists.opennebula.org mailto:Users@lists.opennebula.org
  http://lists.opennebula.org/__listinfo.cgi/users-opennebula.__org
  http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
 
 
 
 
  --
  --
  Ruben S. Montero, PhD
  Project co-Lead and Chief Architect
  OpenNebula - Flexible Enterprise Cloud Made Simple
  www.OpenNebula.org http://www.OpenNebula.org |
  rsmont...@opennebula.org mailto:rsmont...@opennebula.org | @OpenNebula