I've checked in our machines and it's not normal. Kill those processes. After some time it will be started again, hopefully only one.
On Tue, Jan 21, 2014 at 1:53 PM, Gerry O'Brien <[email protected]> wrote: > Hi, > > I've gotten down to only one collestd-client.rb process (see below). Are > the multiple kvm-probes OK? > > Regards, > Gerry > > > > > root@host101:~# ps -ef | grep one > oneadmin 3349 1 0 12:23 ? 00:00:00 ruby > /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124 > 20 0 host101.scss.tcd.ie > oneadmin 21068 3349 0 12:51 ? 00:00:00 /bin/bash > /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 > 20 0 host101.scss.tcd.ie > oneadmin 21076 21068 0 12:51 ? 00:00:00 /bin/bash > /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 > 20 0 host101.scss.tcd.ie > oneadmin 21077 21076 0 12:51 ? 00:00:00 /bin/bash > /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124 > 20 0 host101.scss.tcd.ie > > > > > > On 21/01/2014 10:10, Javier Fontan wrote: >> >> It seems that there are more people having this problem and we are >> taking a look on several ways to fix this. One problem with /var/run >> is that it is normally owned by root and a process started by oneadmin >> user can not write there. In the frontend a new directory for >> OpenNebula pid files is created but in the nodes it does not exist. >> >> On Tue, Jan 21, 2014 at 8:07 AM, Gerry O'Brien <[email protected]> wrote: >>> >>> Hi Javier, >>> >>> See my previous email. Another scenario is when >>> "/tmp/one-collectd-client.pid" does not exist due to issues with /tmp. >>> >>> A change seems to have been made to put a pid file in /tmp instead of >>> /run or /var/run. >>> >>> Regards, >>> Gerry >>> >>> >>> >>> On 20/01/2014 17:44, Javier Fontan wrote: >>>> >>>> I've been trying to reproduce the problem, that is, making OpenNebula >>>> start a high amount of collectd-client processes. The only way I was >>>> able to do it is when the file "/tmp/one-collectd-client.pid" exists >>>> and has wrong permissions. Can you check the ownership and permissions >>>> of that file? >>>> >>>> On Mon, Jan 20, 2014 at 4:15 PM, Javier Fontan <[email protected]> >>>> wrote: >>>>> >>>>> The problem seems to be the high amount of collectd processes running. >>>>> Try killing all "collectd-client.rb" processes. There should be only >>>>> one running per host. >>>>> >>>>> In case you want to use the old method of monitoring you can follow >>>>> this >>>>> guide: >>>>> >>>>> >>>>> >>>>> http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg >>>>> >>>>> On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien <[email protected]> >>>>> wrote: >>>>>> >>>>>> Hi Ruben, >>>>>> >>>>>> Below is the output of 'ps -ef | grep one' on a host that has >>>>>> been >>>>>> disabled, rebooted and enabled. There are multiple versions of >>>>>> collectd-client.rb kvm running. >>>>>> >>>>>> >>>>>> We have discovered today a serious issue that is having an >>>>>> adverse >>>>>> effect on our DNS system. When the machines below was enabled, >>>>>> immediately >>>>>> our DNS server is flooded with requests from the host (see a sample >>>>>> below). >>>>>> Our logs show that this has only started happening since the >>>>>> upgrade to >>>>>> 4.4. If we don't get a fix for this we will have to go back to 4.2, >>>>>> which is >>>>>> something I really don't want to do. >>>>>> >>>>>> Regards, >>>>>> Gerry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> oneadmin 3628 1 0 13:04 ? 00:00:00 ruby >>>>>> /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores >>>>>> 4124 >>>>>> 20 0 host101.scss.tcd.ie >>>>>> oneadmin 4600 1 0 13:05 ? 00:00:00 ruby >>>>>> /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores >>>>>> 4124 >>>>>> 20 0 host101.scss.tcd.ie >>>>>> oneadmin 6400 1 0 13:07 ? 00:00:00 ruby >>>>>> /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores >>>>>> 4124 >>>>>> 20 0 host101.scss.tcd.ie >>>>>> oneadmin 9003 1 0 13:08 ? 00:00:00 ruby >>>>>> /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores >>>>>> 4124 >>>>>> 20 0 host101.scss.tcd.ie >>>>>> oneadmin 12953 3628 0 13:10 ? 00:00:00 /bin/bash >>>>>> /var/tmp/one/im/kvm.d/../run_probes kvm-probes >>>>>> /var/lib/one//datastores >>>>>> 4124 >>>>>> 20 0 host101.scss.tcd.ie >>>>>> oneadmin 12955 6400 0 13:10 ? 00:00:00 /bin/bash >>>>>> /var/tmp/one/im/kvm.d/../run_probes kvm-probes >>>>>> /var/lib/one//datastores >>>>>> 4124 >>>>>> 20 0 host101.scss.tcd.ie >>>>>> oneadmin 12969 12953 0 13:10 ? 00:00:00 /bin/bash >>>>>> /var/tmp/one/im/kvm.d/../run_probes kvm-probes >>>>>> /var/lib/one//datastores >>>>>> 4124 >>>>>> 20 0 host101.scss.tcd.ie >>>>>> oneadmin 12970 12969 0 13:10 ? 00:00:00 /bin/bash >>>>>> /var/tmp/one/im/kvm.d/../run_probes kvm-probes >>>>>> /var/lib/one//datastores >>>>>> 4124 >>>>>> 20 0 host101.scss.tcd.ie >>>>>> oneadmin 12972 12955 0 13:10 ? 00:00:00 /bin/bash >>>>>> /var/tmp/one/im/kvm.d/../run_probes kvm-probes >>>>>> /var/lib/one//datastores >>>>>> 4124 >>>>>> 20 0 host101.scss.tcd.ie >>>>>> oneadmin 12973 12972 0 13:10 ? 00:00:00 /bin/bash >>>>>> /var/tmp/one/im/kvm.d/../run_probes kvm-probes >>>>>> /var/lib/one//datastores >>>>>> 4124 >>>>>> 20 0 host101.scss.tcd.ie >>>>>> oneadmin 13029 12973 0 13:10 ? 00:00:00 /bin/bash >>>>>> ./monitor_ds.sh >>>>>> kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie >>>>>> oneadmin 13030 12970 0 13:10 ? 00:00:00 /bin/bash >>>>>> ./monitor_ds.sh >>>>>> kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie >>>>>> >>>>>> >>>>>> >>>>>> -2014 13:14:26.675 client 134.226.59.101#52314: query: >>>>>> host101.scss.tcd.ie >>>>>> IN AAAA + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: >>>>>> host101.scss.tcd.ie IN A + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query: >>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: >>>>>> host101.scss.tcd.ie IN A + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query: >>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query: >>>>>> host101.scss.tcd.ie IN A + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query: >>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: >>>>>> host101.scss.tcd.ie IN A + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query: >>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:26.952 client 134.226.59.101#53975: query: >>>>>> host101.scss.tcd.ie IN A + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:26.953 client 134.226.59.101#53975: query: >>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query: >>>>>> host101.scss.tcd.ie IN A + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query: >>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query: >>>>>> host101.scss.tcd.ie IN A + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query: >>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:27.347 client 134.226.59.101#49614: query: >>>>>> host101.scss.tcd.ie IN A + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:27.348 client 134.226.59.101#49614: query: >>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:27.350 client 134.226.59.101#44058: query: >>>>>> host101.scss.tcd.ie IN A + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:27.357 client 134.226.59.101#44058: query: >>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:27.458 client 134.226.59.101#51830: query: >>>>>> host101.scss.tcd.ie IN A + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:27.458 client 134.226.59.101#51830: query: >>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:27.461 client 134.226.59.101#38419: query: >>>>>> host101.scss.tcd.ie IN A + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:27.461 client 134.226.59.101#38419: query: >>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:31.184 client 134.226.59.101#38617: query: >>>>>> host101.scss.tcd.ie IN A + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:31.184 client 134.226.59.101#38617: query: >>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57) >>>>>> 20-Jan-2014 13:14:31.302 client 134.226 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 17/01/2014 17:45, Ruben S. Montero wrote: >>>>>>> >>>>>>> Hi Gerry >>>>>>> >>>>>>> Just to check, are you using 4.4 Final? We've seen this in the betas >>>>>>> and >>>>>>> "thought" we fixed for the final version. Also could you check that >>>>>>> there >>>>>>> are just one monitorization process at the hosts (collectd-client.sh, >>>>>>> or >>>>>>> equiv should be the name of the process) >>>>>>> >>>>>>> Also could you send us the lines from oned.log between Thu Jan 16 >>>>>>> 16:56:25 >>>>>>> 2014 and Thu Jan 16 17:25:43 2014; plus the first lines that includes >>>>>>> you >>>>>>> oned.conf values (we are interested specially in those related to >>>>>>> monitoring interval) >>>>>>> >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>> Ruben >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jan 17, 2014 at 2:27 PM, Gerry O'Brien <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Below is a truncated log file for a VM. The monitor >>>>>>>> continually >>>>>>>> cycles >>>>>>>> through finding the machine RUNNING and stat UNKNOWN. This occurs >>>>>>>> for >>>>>>>> many >>>>>>>> many machines at the same time. All machines were created by a >>>>>>>> script. >>>>>>>> >>>>>>>> The VMs are Microsoft Windows 7 64bit Enterprise. Individual >>>>>>>> context >>>>>>>> is created by a startup script. They run fine but eventually >>>>>>>> /var/log/one >>>>>>>> is going overflow. >>>>>>>> >>>>>>>> Restarting oned seems to fix the problem but this is hardly a >>>>>>>> long >>>>>>>> term solution. >>>>>>>> >>>>>>>> Any suggestions on what could be causing this? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Gerry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thu Jan 16 16:56:21 2014 [DiM][I]: New VM state is ACTIVE. >>>>>>>> Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is PROLOG. >>>>>>>> Thu Jan 16 16:56:22 2014 [VM][I]: Virtual Machine has no context >>>>>>>> Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is BOOT >>>>>>>> Thu Jan 16 16:56:22 2014 [VMM][I]: Generating deployment file: >>>>>>>> /var/lib/one/vms/1788/deployment.0 >>>>>>>> Thu Jan 16 16:56:23 2014 [VMM][I]: ExitCode: 0 >>>>>>>> Thu Jan 16 16:56:23 2014 [VMM][I]: Successfully execute network >>>>>>>> driver >>>>>>>> operation: pre. >>>>>>>> Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0 >>>>>>>> Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute >>>>>>>> virtualization >>>>>>>> driver operation: deploy. >>>>>>>> Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0 >>>>>>>> Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute network >>>>>>>> driver >>>>>>>> operation: post. >>>>>>>> Thu Jan 16 16:56:25 2014 [LCM][I]: New VM state is RUNNING >>>>>>>> Thu Jan 16 16:56:51 2014 [LCM][I]: New VM state is UNKNOWN >>>>>>>> Thu Jan 16 16:59:01 2014 [VMM][I]: VM found again, state is RUNNING >>>>>>>> Thu Jan 16 16:59:23 2014 [LCM][I]: New VM state is UNKNOWN >>>>>>>> Thu Jan 16 17:01:41 2014 [VMM][I]: VM found again, state is RUNNING >>>>>>>> Thu Jan 16 17:01:58 2014 [LCM][I]: New VM state is UNKNOWN >>>>>>>> Thu Jan 16 17:04:18 2014 [VMM][I]: VM found again, state is RUNNING >>>>>>>> Thu Jan 16 17:04:39 2014 [LCM][I]: New VM state is UNKNOWN >>>>>>>> Thu Jan 16 17:06:55 2014 [VMM][I]: VM found again, state is RUNNING >>>>>>>> Thu Jan 16 17:07:06 2014 [LCM][I]: New VM state is UNKNOWN >>>>>>>> Thu Jan 16 17:09:31 2014 [VMM][I]: VM found again, state is RUNNING >>>>>>>> Thu Jan 16 17:09:31 2014 [LCM][I]: New VM state is UNKNOWN >>>>>>>> Thu Jan 16 17:12:22 2014 [VMM][I]: VM found again, state is RUNNING >>>>>>>> Thu Jan 16 17:12:27 2014 [LCM][I]: New VM state is UNKNOWN >>>>>>>> Thu Jan 16 17:15:11 2014 [VMM][I]: VM found again, state is RUNNING >>>>>>>> Thu Jan 16 17:15:22 2014 [LCM][I]: New VM state is UNKNOWN >>>>>>>> Thu Jan 16 17:17:49 2014 [VMM][I]: VM found again, state is RUNNING >>>>>>>> Thu Jan 16 17:18:00 2014 [LCM][I]: New VM state is UNKNOWN >>>>>>>> Thu Jan 16 17:20:27 2014 [VMM][I]: VM found again, state is RUNNING >>>>>>>> Thu Jan 16 17:20:34 2014 [LCM][I]: New VM state is UNKNOWN >>>>>>>> Thu Jan 16 17:23:04 2014 [VMM][I]: VM found again, state is RUNNING >>>>>>>> Thu Jan 16 17:23:08 2014 [LCM][I]: New VM state is UNKNOWN >>>>>>>> Thu Jan 16 17:25:41 2014 [VMM][I]: VM found again, state is RUNNING >>>>>>>> Thu Jan 16 17:25:43 2014 [LCM][I]: New VM state is UNKNOWN >>>>>>>> >>>>>>>> -- >>>>>>>> Gerry O'Brien >>>>>>>> >>>>>>>> Systems Manager >>>>>>>> School of Computer Science and Statistics >>>>>>>> Trinity College Dublin >>>>>>>> Dublin 2 >>>>>>>> IRELAND >>>>>>>> >>>>>>>> 00 353 1 896 1341 >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org >>>>>>>> >>>>>> -- >>>>>> Gerry O'Brien >>>>>> >>>>>> Systems Manager >>>>>> School of Computer Science and Statistics >>>>>> Trinity College Dublin >>>>>> Dublin 2 >>>>>> IRELAND >>>>>> >>>>>> 00 353 1 896 1341 >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> [email protected] >>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org >>>>> >>>>> >>>>> >>>>> -- >>>>> Javier Fontán Muiños >>>>> Developer >>>>> OpenNebula - The Open Source Toolkit for Data Center Virtualization >>>>> www.OpenNebula.org | @OpenNebula | github.com/jfontan >>>> >>>> >>>> >>> >>> -- >>> Gerry O'Brien >>> >>> Systems Manager >>> School of Computer Science and Statistics >>> Trinity College Dublin >>> Dublin 2 >>> IRELAND >>> >>> 00 353 1 896 1341 >>> >> >> > > > -- > Gerry O'Brien > > Systems Manager > School of Computer Science and Statistics > Trinity College Dublin > Dublin 2 > IRELAND > > 00 353 1 896 1341 > -- Javier Fontán Muiños Developer OpenNebula - The Open Source Toolkit for Data Center Virtualization www.OpenNebula.org | @OpenNebula | github.com/jfontan _______________________________________________ Users mailing list [email protected] http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
