Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN
It seems that there are more people having this problem and we are taking a look on several ways to fix this. One problem with /var/run is that it is normally owned by root and a process started by oneadmin user can not write there. In the frontend a new directory for OpenNebula pid files is created but in the nodes it does not exist. On Tue, Jan 21, 2014 at 8:07 AM, Gerry O'Brien ge...@scss.tcd.ie wrote: Hi Javier, See my previous email. Another scenario is when /tmp/one-collectd-client.pid does not exist due to issues with /tmp. A change seems to have been made to put a pid file in /tmp instead of /run or /var/run. Regards, Gerry On 20/01/2014 17:44, Javier Fontan wrote: I've been trying to reproduce the problem, that is, making OpenNebula start a high amount of collectd-client processes. The only way I was able to do it is when the file /tmp/one-collectd-client.pid exists and has wrong permissions. Can you check the ownership and permissions of that file? On Mon, Jan 20, 2014 at 4:15 PM, Javier Fontan jfon...@opennebula.org wrote: The problem seems to be the high amount of collectd processes running. Try killing all collectd-client.rb processes. There should be only one running per host. In case you want to use the old method of monitoring you can follow this guide: http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien ge...@scss.tcd.ie wrote: Hi Ruben, Below is the output of 'ps -ef | grep one' on a host that has been disabled, rebooted and enabled. There are multiple versions of collectd-client.rb kvm running. We have discovered today a serious issue that is having an adverse effect on our DNS system. When the machines below was enabled, immediately our DNS server is flooded with requests from the host (see a sample below). Our logs show that this has only started happening since the upgrade to 4.4. If we don't get a fix for this we will have to go back to 4.2, which is something I really don't want to do. Regards, Gerry oneadmin 3628 1 0 13:04 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 4600 1 0 13:05 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 6400 1 0 13:07 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 9003 1 0 13:08 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12953 3628 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12955 6400 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12969 12953 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12970 12969 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12972 12955 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12973 12972 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 13029 12973 0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 13030 12970 0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie -2014 13:14:26.675 client 134.226.59.101#52314: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN
Javier Fontan jfon...@opennebula.org writes: It seems that there are more people having this problem and we are taking a look on several ways to fix this. One problem with /var/run is that it is normally owned by root and a process started by oneadmin user can not write there. In the frontend a new directory for OpenNebula pid files is created but in the nodes it does not exist. Hello, What do you think about a “ONE node setup init script”? On my debian systems I have opennebula-common and opennebula-node on each nodes. The -node could include an init script to setup an opennebula directory in /var/run[1] with proper owner and permissions. My 2¢. Regards. Footnotes: [1] /run now on debian system -- Daniel Dehennin Récupérer ma clef GPG: gpg --keyserver pgp.mit.edu --recv-keys 0x7A6FE2DF pgp4QS4lYcl1v.pgp Description: PGP signature ___ Users mailing list Users@lists.opennebula.org http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN
Hi, I've gotten down to only one collestd-client.rb process (see below). Are the multiple kvm-probes OK? Regards, Gerry root@host101:~# ps -ef | grep one oneadmin 3349 1 0 12:23 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 21068 3349 0 12:51 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 21076 21068 0 12:51 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 21077 21076 0 12:51 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie On 21/01/2014 10:10, Javier Fontan wrote: It seems that there are more people having this problem and we are taking a look on several ways to fix this. One problem with /var/run is that it is normally owned by root and a process started by oneadmin user can not write there. In the frontend a new directory for OpenNebula pid files is created but in the nodes it does not exist. On Tue, Jan 21, 2014 at 8:07 AM, Gerry O'Brien ge...@scss.tcd.ie wrote: Hi Javier, See my previous email. Another scenario is when /tmp/one-collectd-client.pid does not exist due to issues with /tmp. A change seems to have been made to put a pid file in /tmp instead of /run or /var/run. Regards, Gerry On 20/01/2014 17:44, Javier Fontan wrote: I've been trying to reproduce the problem, that is, making OpenNebula start a high amount of collectd-client processes. The only way I was able to do it is when the file /tmp/one-collectd-client.pid exists and has wrong permissions. Can you check the ownership and permissions of that file? On Mon, Jan 20, 2014 at 4:15 PM, Javier Fontan jfon...@opennebula.org wrote: The problem seems to be the high amount of collectd processes running. Try killing all collectd-client.rb processes. There should be only one running per host. In case you want to use the old method of monitoring you can follow this guide: http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien ge...@scss.tcd.ie wrote: Hi Ruben, Below is the output of 'ps -ef | grep one' on a host that has been disabled, rebooted and enabled. There are multiple versions of collectd-client.rb kvm running. We have discovered today a serious issue that is having an adverse effect on our DNS system. When the machines below was enabled, immediately our DNS server is flooded with requests from the host (see a sample below). Our logs show that this has only started happening since the upgrade to 4.4. If we don't get a fix for this we will have to go back to 4.2, which is something I really don't want to do. Regards, Gerry oneadmin 3628 1 0 13:04 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 4600 1 0 13:05 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 6400 1 0 13:07 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 9003 1 0 13:08 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12953 3628 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12955 6400 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12969 12953 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12970 12969 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12972 12955 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12973 12972 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 13029 12973 0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 13030 12970 0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie -2014 13:14:26.675 client 134.226.59.101#52314: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.680 client
Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN
Hi Ruben, Below is the output of 'ps -ef | grep one' on a host that has been disabled, rebooted and enabled. There are multiple versions of collectd-client.rb kvm running. We have discovered today a serious issue that is having an adverse effect on our DNS system. When the machines below was enabled, immediately our DNS server is flooded with requests from the host (see a sample below). Our logs show that this has only started happening since the upgrade to 4.4. If we don't get a fix for this we will have to go back to 4.2, which is something I really don't want to do. Regards, Gerry oneadmin 3628 1 0 13:04 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 4600 1 0 13:05 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 6400 1 0 13:07 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 9003 1 0 13:08 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12953 3628 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12955 6400 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12969 12953 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12970 12969 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12972 12955 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12973 12972 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 13029 12973 0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 13030 12970 0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie -2014 13:14:26.675 client 134.226.59.101#52314: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#53975: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.953 client 134.226.59.101#53975: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.347 client 134.226.59.101#49614: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:27.348 client 134.226.59.101#49614: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.350 client 134.226.59.101#44058: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:27.357 client 134.226.59.101#44058: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.458 client 134.226.59.101#51830: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:27.458 client 134.226.59.101#51830: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.461 client 134.226.59.101#38419: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:27.461 client 134.226.59.101#38419: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:31.184
Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN
The problem seems to be the high amount of collectd processes running. Try killing all collectd-client.rb processes. There should be only one running per host. In case you want to use the old method of monitoring you can follow this guide: http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien ge...@scss.tcd.ie wrote: Hi Ruben, Below is the output of 'ps -ef | grep one' on a host that has been disabled, rebooted and enabled. There are multiple versions of collectd-client.rb kvm running. We have discovered today a serious issue that is having an adverse effect on our DNS system. When the machines below was enabled, immediately our DNS server is flooded with requests from the host (see a sample below). Our logs show that this has only started happening since the upgrade to 4.4. If we don't get a fix for this we will have to go back to 4.2, which is something I really don't want to do. Regards, Gerry oneadmin 3628 1 0 13:04 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 4600 1 0 13:05 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 6400 1 0 13:07 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 9003 1 0 13:08 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12953 3628 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12955 6400 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12969 12953 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12970 12969 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12972 12955 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12973 12972 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 13029 12973 0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 13030 12970 0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie -2014 13:14:26.675 client 134.226.59.101#52314: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#53975: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.953 client 134.226.59.101#53975: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.347 client 134.226.59.101#49614: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:27.348 client 134.226.59.101#49614: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.350 client 134.226.59.101#44058: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:27.357 client 134.226.59.101#44058: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.458
Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN - Possibly solved
Hi, I think we've figured out the cause of the issues reported above and they are particular to our installation. All our hosts use an NFS mounted root partition. The reasons for using this approach are historical and were supposed to make it easier to keep the hosts equally up-to-date. The issue here was that /tmp was the same for every host which caused collectd-client_control.sh to run multiple instances of collectd-client.rb as it writes its PID in /tmp and collectd-client_control.sh couldn't find the PID of the already running collectd-client.rb. My guess is that the DNS issue is related to the explicit use of the hostname in ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 3 host104.scss.tcd.ie. This seems to have changed since 4.2. The multiple copies of collectd-client.rb only exacerbated the problem. As we have a single hosts file for every host the solution was to place DNS entries for all hosts in /etc/hosts Regards, Gerry On 20/01/2014 15:15, Javier Fontan wrote: The problem seems to be the high amount of collectd processes running. Try killing all collectd-client.rb processes. There should be only one running per host. In case you want to use the old method of monitoring you can follow this guide: http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien ge...@scss.tcd.ie wrote: Hi Ruben, Below is the output of 'ps -ef | grep one' on a host that has been disabled, rebooted and enabled. There are multiple versions of collectd-client.rb kvm running. We have discovered today a serious issue that is having an adverse effect on our DNS system. When the machines below was enabled, immediately our DNS server is flooded with requests from the host (see a sample below). Our logs show that this has only started happening since the upgrade to 4.4. If we don't get a fix for this we will have to go back to 4.2, which is something I really don't want to do. Regards, Gerry oneadmin 3628 1 0 13:04 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 4600 1 0 13:05 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 6400 1 0 13:07 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 9003 1 0 13:08 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12953 3628 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12955 6400 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12969 12953 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12970 12969 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12972 12955 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12973 12972 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 13029 12973 0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 13030 12970 0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie -2014 13:14:26.675 client 134.226.59.101#52314: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#53975: query:
Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN
I've been trying to reproduce the problem, that is, making OpenNebula start a high amount of collectd-client processes. The only way I was able to do it is when the file /tmp/one-collectd-client.pid exists and has wrong permissions. Can you check the ownership and permissions of that file? On Mon, Jan 20, 2014 at 4:15 PM, Javier Fontan jfon...@opennebula.org wrote: The problem seems to be the high amount of collectd processes running. Try killing all collectd-client.rb processes. There should be only one running per host. In case you want to use the old method of monitoring you can follow this guide: http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien ge...@scss.tcd.ie wrote: Hi Ruben, Below is the output of 'ps -ef | grep one' on a host that has been disabled, rebooted and enabled. There are multiple versions of collectd-client.rb kvm running. We have discovered today a serious issue that is having an adverse effect on our DNS system. When the machines below was enabled, immediately our DNS server is flooded with requests from the host (see a sample below). Our logs show that this has only started happening since the upgrade to 4.4. If we don't get a fix for this we will have to go back to 4.2, which is something I really don't want to do. Regards, Gerry oneadmin 3628 1 0 13:04 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 4600 1 0 13:05 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 6400 1 0 13:07 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 9003 1 0 13:08 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12953 3628 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12955 6400 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12969 12953 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12970 12969 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12972 12955 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12973 12972 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 13029 12973 0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 13030 12970 0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie -2014 13:14:26.675 client 134.226.59.101#52314: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#53975: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.953 client 134.226.59.101#53975: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.347 client 134.226.59.101#49614: query:
Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN
Hi Javier, See my previous email. Another scenario is when /tmp/one-collectd-client.pid does not exist due to issues with /tmp. A change seems to have been made to put a pid file in /tmp instead of /run or /var/run. Regards, Gerry On 20/01/2014 17:44, Javier Fontan wrote: I've been trying to reproduce the problem, that is, making OpenNebula start a high amount of collectd-client processes. The only way I was able to do it is when the file /tmp/one-collectd-client.pid exists and has wrong permissions. Can you check the ownership and permissions of that file? On Mon, Jan 20, 2014 at 4:15 PM, Javier Fontan jfon...@opennebula.org wrote: The problem seems to be the high amount of collectd processes running. Try killing all collectd-client.rb processes. There should be only one running per host. In case you want to use the old method of monitoring you can follow this guide: http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien ge...@scss.tcd.ie wrote: Hi Ruben, Below is the output of 'ps -ef | grep one' on a host that has been disabled, rebooted and enabled. There are multiple versions of collectd-client.rb kvm running. We have discovered today a serious issue that is having an adverse effect on our DNS system. When the machines below was enabled, immediately our DNS server is flooded with requests from the host (see a sample below). Our logs show that this has only started happening since the upgrade to 4.4. If we don't get a fix for this we will have to go back to 4.2, which is something I really don't want to do. Regards, Gerry oneadmin 3628 1 0 13:04 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 4600 1 0 13:05 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 6400 1 0 13:07 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 9003 1 0 13:08 ?00:00:00 ruby /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12953 3628 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12955 6400 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12969 12953 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12970 12969 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12972 12955 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 12973 12972 0 13:10 ?00:00:00 /bin/bash /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 13029 12973 0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie oneadmin 13030 12970 0 13:10 ?00:00:00 /bin/bash ./monitor_ds.sh kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie -2014 13:14:26.675 client 134.226.59.101#52314: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:26.952 client 134.226.59.101#53975: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:26.953 client 134.226.59.101#53975: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query: host101.scss.tcd.ie IN A + (134.226.32.57) 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query: host101.scss.tcd.ie IN + (134.226.32.57) 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277:
[one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN
Hi, Below is a truncated log file for a VM. The monitor continually cycles through finding the machine RUNNING and stat UNKNOWN. This occurs for many many machines at the same time. All machines were created by a script. The VMs are Microsoft Windows 7 64bit Enterprise. Individual context is created by a startup script. They run fine but eventually /var/log/one is going overflow. Restarting oned seems to fix the problem but this is hardly a long term solution. Any suggestions on what could be causing this? Regards, Gerry Thu Jan 16 16:56:21 2014 [DiM][I]: New VM state is ACTIVE. Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is PROLOG. Thu Jan 16 16:56:22 2014 [VM][I]: Virtual Machine has no context Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is BOOT Thu Jan 16 16:56:22 2014 [VMM][I]: Generating deployment file: /var/lib/one/vms/1788/deployment.0 Thu Jan 16 16:56:23 2014 [VMM][I]: ExitCode: 0 Thu Jan 16 16:56:23 2014 [VMM][I]: Successfully execute network driver operation: pre. Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0 Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute virtualization driver operation: deploy. Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0 Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute network driver operation: post. Thu Jan 16 16:56:25 2014 [LCM][I]: New VM state is RUNNING Thu Jan 16 16:56:51 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 16:59:01 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 16:59:23 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:01:41 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:01:58 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:04:18 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:04:39 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:06:55 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:07:06 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:09:31 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:09:31 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:12:22 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:12:27 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:15:11 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:15:22 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:17:49 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:18:00 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:20:27 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:20:34 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:23:04 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:23:08 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:25:41 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:25:43 2014 [LCM][I]: New VM state is UNKNOWN -- Gerry O'Brien Systems Manager School of Computer Science and Statistics Trinity College Dublin Dublin 2 IRELAND 00 353 1 896 1341 ___ Users mailing list Users@lists.opennebula.org http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN
I would like to input -- We use ONE4.4 (final) and see this UNKNOWN stat for some of the VMs as well. Thanks, Gene On Fri 17 Jan 2014 12:45:47 PM EST, Ruben S. Montero wrote: Hi Gerry Just to check, are you using 4.4 Final? We've seen this in the betas and thought we fixed for the final version. Also could you check that there are just one monitorization process at the hosts (collectd-client.sh, or equiv should be the name of the process) Also could you send us the lines from oned.log between Thu Jan 16 16:56:25 2014 and Thu Jan 16 17:25:43 2014; plus the first lines that includes you oned.conf values (we are interested specially in those related to monitoring interval) Cheers Ruben On Fri, Jan 17, 2014 at 2:27 PM, Gerry O'Brien ge...@scss.tcd.ie mailto:ge...@scss.tcd.ie wrote: Hi, Below is a truncated log file for a VM. The monitor continually cycles through finding the machine RUNNING and stat UNKNOWN. This occurs for many many machines at the same time. All machines were created by a script. The VMs are Microsoft Windows 7 64bit Enterprise. Individual context is created by a startup script. They run fine but eventually /var/log/one is going overflow. Restarting oned seems to fix the problem but this is hardly a long term solution. Any suggestions on what could be causing this? Regards, Gerry Thu Jan 16 16:56:21 2014 [DiM][I]: New VM state is ACTIVE. Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is PROLOG. Thu Jan 16 16:56:22 2014 [VM][I]: Virtual Machine has no context Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is BOOT Thu Jan 16 16:56:22 2014 [VMM][I]: Generating deployment file: /var/lib/one/vms/1788/__deployment.0 Thu Jan 16 16:56:23 2014 [VMM][I]: ExitCode: 0 Thu Jan 16 16:56:23 2014 [VMM][I]: Successfully execute network driver operation: pre. Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0 Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute virtualization driver operation: deploy. Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0 Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute network driver operation: post. Thu Jan 16 16:56:25 2014 [LCM][I]: New VM state is RUNNING Thu Jan 16 16:56:51 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 16:59:01 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 16:59:23 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:01:41 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:01:58 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:04:18 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:04:39 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:06:55 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:07:06 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:09:31 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:09:31 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:12:22 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:12:27 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:15:11 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:15:22 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:17:49 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:18:00 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:20:27 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:20:34 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:23:04 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:23:08 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:25:41 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:25:43 2014 [LCM][I]: New VM state is UNKNOWN -- Gerry O'Brien Systems Manager School of Computer Science and Statistics Trinity College Dublin Dublin 2 IRELAND 00 353 1 896 1341 _ Users mailing list Users@lists.opennebula.org mailto:Users@lists.opennebula.org http://lists.opennebula.org/__listinfo.cgi/users-opennebula.__org http://lists.opennebula.org/listinfo.cgi/users-opennebula.org -- -- Ruben S. Montero, PhD Project co-Lead and Chief Architect OpenNebula - Flexible Enterprise Cloud Made Simple www.OpenNebula.org http://www.OpenNebula.org | rsmont...@opennebula.org mailto:rsmont...@opennebula.org | @OpenNebula ___ Users mailing list Users@lists.opennebula.org http://lists.opennebula.org/listinfo.cgi/users-opennebula.org ___ Users mailing list Users@lists.opennebula.org http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
Re: [one-users] Monitor continually cycles through finding machines RUNNING and stat UNKNOWN
OK, thanks Filled an issue for this http://dev.opennebula.org/issues/2656 We'll try to reproduce it also in our infrastructure. Cheers Ruben On Fri, Jan 17, 2014 at 7:00 PM, Liu, Guang Jun (Gene) gene@alcatel-lucent.com wrote: I would like to input -- We use ONE4.4 (final) and see this UNKNOWN stat for some of the VMs as well. Thanks, Gene On Fri 17 Jan 2014 12:45:47 PM EST, Ruben S. Montero wrote: Hi Gerry Just to check, are you using 4.4 Final? We've seen this in the betas and thought we fixed for the final version. Also could you check that there are just one monitorization process at the hosts (collectd-client.sh, or equiv should be the name of the process) Also could you send us the lines from oned.log between Thu Jan 16 16:56:25 2014 and Thu Jan 16 17:25:43 2014; plus the first lines that includes you oned.conf values (we are interested specially in those related to monitoring interval) Cheers Ruben On Fri, Jan 17, 2014 at 2:27 PM, Gerry O'Brien ge...@scss.tcd.ie mailto:ge...@scss.tcd.ie wrote: Hi, Below is a truncated log file for a VM. The monitor continually cycles through finding the machine RUNNING and stat UNKNOWN. This occurs for many many machines at the same time. All machines were created by a script. The VMs are Microsoft Windows 7 64bit Enterprise. Individual context is created by a startup script. They run fine but eventually /var/log/one is going overflow. Restarting oned seems to fix the problem but this is hardly a long term solution. Any suggestions on what could be causing this? Regards, Gerry Thu Jan 16 16:56:21 2014 [DiM][I]: New VM state is ACTIVE. Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is PROLOG. Thu Jan 16 16:56:22 2014 [VM][I]: Virtual Machine has no context Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is BOOT Thu Jan 16 16:56:22 2014 [VMM][I]: Generating deployment file: /var/lib/one/vms/1788/__deployment.0 Thu Jan 16 16:56:23 2014 [VMM][I]: ExitCode: 0 Thu Jan 16 16:56:23 2014 [VMM][I]: Successfully execute network driver operation: pre. Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0 Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute virtualization driver operation: deploy. Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0 Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute network driver operation: post. Thu Jan 16 16:56:25 2014 [LCM][I]: New VM state is RUNNING Thu Jan 16 16:56:51 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 16:59:01 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 16:59:23 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:01:41 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:01:58 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:04:18 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:04:39 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:06:55 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:07:06 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:09:31 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:09:31 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:12:22 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:12:27 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:15:11 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:15:22 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:17:49 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:18:00 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:20:27 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:20:34 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:23:04 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:23:08 2014 [LCM][I]: New VM state is UNKNOWN Thu Jan 16 17:25:41 2014 [VMM][I]: VM found again, state is RUNNING Thu Jan 16 17:25:43 2014 [LCM][I]: New VM state is UNKNOWN -- Gerry O'Brien Systems Manager School of Computer Science and Statistics Trinity College Dublin Dublin 2 IRELAND 00 353 1 896 1341 _ Users mailing list Users@lists.opennebula.org mailto:Users@lists.opennebula.org http://lists.opennebula.org/__listinfo.cgi/users-opennebula.__org http://lists.opennebula.org/listinfo.cgi/users-opennebula.org -- -- Ruben S. Montero, PhD Project co-Lead and Chief Architect OpenNebula - Flexible Enterprise Cloud Made Simple www.OpenNebula.org http://www.OpenNebula.org | rsmont...@opennebula.org mailto:rsmont...@opennebula.org | @OpenNebula