I've checked in our machines and it's not normal. Kill those
processes. After some time it will be started again, hopefully only
one.

On Tue, Jan 21, 2014 at 1:53 PM, Gerry O'Brien <[email protected]> wrote:
> Hi,
>
>     I've gotten down to only one collestd-client.rb process (see below). Are
> the multiple kvm-probes OK?
>
>         Regards,
>           Gerry
>
>
>
>
> root@host101:~# ps -ef | grep one
> oneadmin  3349     1  0 12:23 ?        00:00:00 ruby
> /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
> 20 0 host101.scss.tcd.ie
> oneadmin 21068  3349  0 12:51 ?        00:00:00 /bin/bash
> /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
> 20 0 host101.scss.tcd.ie
> oneadmin 21076 21068  0 12:51 ?        00:00:00 /bin/bash
> /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
> 20 0 host101.scss.tcd.ie
> oneadmin 21077 21076  0 12:51 ?        00:00:00 /bin/bash
> /var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
> 20 0 host101.scss.tcd.ie
>
>
>
>
>
> On 21/01/2014 10:10, Javier Fontan wrote:
>>
>> It seems that there are more people having this problem and we are
>> taking a look on several ways to fix this. One problem with /var/run
>> is that it is normally owned by root and a process started by oneadmin
>> user can not write there. In the frontend a new directory for
>> OpenNebula pid files is created but in the nodes it does not exist.
>>
>> On Tue, Jan 21, 2014 at 8:07 AM, Gerry O'Brien <[email protected]> wrote:
>>>
>>> Hi Javier,
>>>
>>>    See my previous email. Another scenario is when
>>> "/tmp/one-collectd-client.pid" does not exist due to issues with /tmp.
>>>
>>>     A change seems to have been made to put a pid file in /tmp instead of
>>> /run or /var/run.
>>>
>>>          Regards,
>>>            Gerry
>>>
>>>
>>>
>>> On 20/01/2014 17:44, Javier Fontan wrote:
>>>>
>>>> I've been trying to reproduce the problem, that is, making OpenNebula
>>>> start a high amount of collectd-client processes. The only way I was
>>>> able to do it is when the file "/tmp/one-collectd-client.pid" exists
>>>> and has wrong permissions. Can you check the ownership and permissions
>>>> of that file?
>>>>
>>>> On Mon, Jan 20, 2014 at 4:15 PM, Javier Fontan <[email protected]>
>>>> wrote:
>>>>>
>>>>> The problem seems to be the high amount of collectd processes running.
>>>>> Try killing all "collectd-client.rb" processes. There should be only
>>>>> one running per host.
>>>>>
>>>>> In case you want to use the old method of monitoring you can follow
>>>>> this
>>>>> guide:
>>>>>
>>>>>
>>>>>
>>>>> http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg
>>>>>
>>>>> On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> Hi Ruben,
>>>>>>
>>>>>>       Below is the output of 'ps -ef | grep one' on a host that has
>>>>>> been
>>>>>> disabled, rebooted and enabled. There are multiple versions of
>>>>>> collectd-client.rb kvm running.
>>>>>>
>>>>>>
>>>>>>       We have discovered today a serious issue that is having an
>>>>>> adverse
>>>>>> effect on our DNS system. When the machines below was enabled,
>>>>>> immediately
>>>>>> our DNS server is flooded with requests from the host (see a sample
>>>>>> below).
>>>>>>        Our logs show that this has only started happening since the
>>>>>> upgrade to
>>>>>> 4.4. If we don't get a fix for this we will have to go back to 4.2,
>>>>>> which is
>>>>>> something I really don't want to do.
>>>>>>
>>>>>>           Regards,
>>>>>>               Gerry
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> oneadmin  3628     1  0 13:04 ?        00:00:00 ruby
>>>>>> /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores
>>>>>> 4124
>>>>>> 20 0 host101.scss.tcd.ie
>>>>>> oneadmin  4600     1  0 13:05 ?        00:00:00 ruby
>>>>>> /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores
>>>>>> 4124
>>>>>> 20 0 host101.scss.tcd.ie
>>>>>> oneadmin  6400     1  0 13:07 ?        00:00:00 ruby
>>>>>> /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores
>>>>>> 4124
>>>>>> 20 0 host101.scss.tcd.ie
>>>>>> oneadmin  9003     1  0 13:08 ?        00:00:00 ruby
>>>>>> /var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores
>>>>>> 4124
>>>>>> 20 0 host101.scss.tcd.ie
>>>>>> oneadmin 12953  3628  0 13:10 ?        00:00:00 /bin/bash
>>>>>> /var/tmp/one/im/kvm.d/../run_probes kvm-probes
>>>>>> /var/lib/one//datastores
>>>>>> 4124
>>>>>> 20 0 host101.scss.tcd.ie
>>>>>> oneadmin 12955  6400  0 13:10 ?        00:00:00 /bin/bash
>>>>>> /var/tmp/one/im/kvm.d/../run_probes kvm-probes
>>>>>> /var/lib/one//datastores
>>>>>> 4124
>>>>>> 20 0 host101.scss.tcd.ie
>>>>>> oneadmin 12969 12953  0 13:10 ?        00:00:00 /bin/bash
>>>>>> /var/tmp/one/im/kvm.d/../run_probes kvm-probes
>>>>>> /var/lib/one//datastores
>>>>>> 4124
>>>>>> 20 0 host101.scss.tcd.ie
>>>>>> oneadmin 12970 12969  0 13:10 ?        00:00:00 /bin/bash
>>>>>> /var/tmp/one/im/kvm.d/../run_probes kvm-probes
>>>>>> /var/lib/one//datastores
>>>>>> 4124
>>>>>> 20 0 host101.scss.tcd.ie
>>>>>> oneadmin 12972 12955  0 13:10 ?        00:00:00 /bin/bash
>>>>>> /var/tmp/one/im/kvm.d/../run_probes kvm-probes
>>>>>> /var/lib/one//datastores
>>>>>> 4124
>>>>>> 20 0 host101.scss.tcd.ie
>>>>>> oneadmin 12973 12972  0 13:10 ?        00:00:00 /bin/bash
>>>>>> /var/tmp/one/im/kvm.d/../run_probes kvm-probes
>>>>>> /var/lib/one//datastores
>>>>>> 4124
>>>>>> 20 0 host101.scss.tcd.ie
>>>>>> oneadmin 13029 12973  0 13:10 ?        00:00:00 /bin/bash
>>>>>> ./monitor_ds.sh
>>>>>> kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie
>>>>>> oneadmin 13030 12970  0 13:10 ?        00:00:00 /bin/bash
>>>>>> ./monitor_ds.sh
>>>>>> kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie
>>>>>>
>>>>>>
>>>>>>
>>>>>> -2014 13:14:26.675 client 134.226.59.101#52314: query:
>>>>>> host101.scss.tcd.ie
>>>>>> IN AAAA + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
>>>>>> host101.scss.tcd.ie IN A + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
>>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
>>>>>> host101.scss.tcd.ie IN A + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
>>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query:
>>>>>> host101.scss.tcd.ie IN A + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query:
>>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
>>>>>> host101.scss.tcd.ie IN A + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
>>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:26.952 client 134.226.59.101#53975: query:
>>>>>> host101.scss.tcd.ie IN A + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:26.953 client 134.226.59.101#53975: query:
>>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query:
>>>>>> host101.scss.tcd.ie IN A + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query:
>>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query:
>>>>>> host101.scss.tcd.ie IN A + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query:
>>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:27.347 client 134.226.59.101#49614: query:
>>>>>> host101.scss.tcd.ie IN A + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:27.348 client 134.226.59.101#49614: query:
>>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:27.350 client 134.226.59.101#44058: query:
>>>>>> host101.scss.tcd.ie IN A + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:27.357 client 134.226.59.101#44058: query:
>>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:27.458 client 134.226.59.101#51830: query:
>>>>>> host101.scss.tcd.ie IN A + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:27.458 client 134.226.59.101#51830: query:
>>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:27.461 client 134.226.59.101#38419: query:
>>>>>> host101.scss.tcd.ie IN A + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:27.461 client 134.226.59.101#38419: query:
>>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:31.184 client 134.226.59.101#38617: query:
>>>>>> host101.scss.tcd.ie IN A + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:31.184 client 134.226.59.101#38617: query:
>>>>>> host101.scss.tcd.ie IN AAAA + (134.226.32.57)
>>>>>> 20-Jan-2014 13:14:31.302 client 134.226
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 17/01/2014 17:45, Ruben S. Montero wrote:
>>>>>>>
>>>>>>> Hi Gerry
>>>>>>>
>>>>>>> Just to check, are you using 4.4 Final? We've seen this in the betas
>>>>>>> and
>>>>>>> "thought" we fixed for the final version. Also could you check that
>>>>>>> there
>>>>>>> are just one monitorization process at the hosts (collectd-client.sh,
>>>>>>> or
>>>>>>> equiv should be the name of the process)
>>>>>>>
>>>>>>> Also could you send us the lines from oned.log between Thu Jan 16
>>>>>>> 16:56:25
>>>>>>> 2014 and Thu Jan 16 17:25:43 2014; plus the first lines that includes
>>>>>>> you
>>>>>>> oned.conf values (we are interested specially in those related to
>>>>>>> monitoring interval)
>>>>>>>
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> Ruben
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 17, 2014 at 2:27 PM, Gerry O'Brien <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>        Below is a truncated log file for a VM. The monitor
>>>>>>>> continually
>>>>>>>> cycles
>>>>>>>> through finding the machine RUNNING and stat UNKNOWN. This occurs
>>>>>>>> for
>>>>>>>> many
>>>>>>>> many machines at the same time. All machines were created by a
>>>>>>>> script.
>>>>>>>>
>>>>>>>>        The VMs are Microsoft Windows 7 64bit Enterprise. Individual
>>>>>>>> context
>>>>>>>> is created by a startup script. They run fine but eventually
>>>>>>>> /var/log/one
>>>>>>>> is going overflow.
>>>>>>>>
>>>>>>>>        Restarting oned seems to fix the problem but this is hardly a
>>>>>>>> long
>>>>>>>> term solution.
>>>>>>>>
>>>>>>>>        Any suggestions on what could be causing this?
>>>>>>>>
>>>>>>>>            Regards,
>>>>>>>>                Gerry
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thu Jan 16 16:56:21 2014 [DiM][I]: New VM state is ACTIVE.
>>>>>>>> Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is PROLOG.
>>>>>>>> Thu Jan 16 16:56:22 2014 [VM][I]: Virtual Machine has no context
>>>>>>>> Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is BOOT
>>>>>>>> Thu Jan 16 16:56:22 2014 [VMM][I]: Generating deployment file:
>>>>>>>> /var/lib/one/vms/1788/deployment.0
>>>>>>>> Thu Jan 16 16:56:23 2014 [VMM][I]: ExitCode: 0
>>>>>>>> Thu Jan 16 16:56:23 2014 [VMM][I]: Successfully execute network
>>>>>>>> driver
>>>>>>>> operation: pre.
>>>>>>>> Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0
>>>>>>>> Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute
>>>>>>>> virtualization
>>>>>>>> driver operation: deploy.
>>>>>>>> Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0
>>>>>>>> Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute network
>>>>>>>> driver
>>>>>>>> operation: post.
>>>>>>>> Thu Jan 16 16:56:25 2014 [LCM][I]: New VM state is RUNNING
>>>>>>>> Thu Jan 16 16:56:51 2014 [LCM][I]: New VM state is UNKNOWN
>>>>>>>> Thu Jan 16 16:59:01 2014 [VMM][I]: VM found again, state is RUNNING
>>>>>>>> Thu Jan 16 16:59:23 2014 [LCM][I]: New VM state is UNKNOWN
>>>>>>>> Thu Jan 16 17:01:41 2014 [VMM][I]: VM found again, state is RUNNING
>>>>>>>> Thu Jan 16 17:01:58 2014 [LCM][I]: New VM state is UNKNOWN
>>>>>>>> Thu Jan 16 17:04:18 2014 [VMM][I]: VM found again, state is RUNNING
>>>>>>>> Thu Jan 16 17:04:39 2014 [LCM][I]: New VM state is UNKNOWN
>>>>>>>> Thu Jan 16 17:06:55 2014 [VMM][I]: VM found again, state is RUNNING
>>>>>>>> Thu Jan 16 17:07:06 2014 [LCM][I]: New VM state is UNKNOWN
>>>>>>>> Thu Jan 16 17:09:31 2014 [VMM][I]: VM found again, state is RUNNING
>>>>>>>> Thu Jan 16 17:09:31 2014 [LCM][I]: New VM state is UNKNOWN
>>>>>>>> Thu Jan 16 17:12:22 2014 [VMM][I]: VM found again, state is RUNNING
>>>>>>>> Thu Jan 16 17:12:27 2014 [LCM][I]: New VM state is UNKNOWN
>>>>>>>> Thu Jan 16 17:15:11 2014 [VMM][I]: VM found again, state is RUNNING
>>>>>>>> Thu Jan 16 17:15:22 2014 [LCM][I]: New VM state is UNKNOWN
>>>>>>>> Thu Jan 16 17:17:49 2014 [VMM][I]: VM found again, state is RUNNING
>>>>>>>> Thu Jan 16 17:18:00 2014 [LCM][I]: New VM state is UNKNOWN
>>>>>>>> Thu Jan 16 17:20:27 2014 [VMM][I]: VM found again, state is RUNNING
>>>>>>>> Thu Jan 16 17:20:34 2014 [LCM][I]: New VM state is UNKNOWN
>>>>>>>> Thu Jan 16 17:23:04 2014 [VMM][I]: VM found again, state is RUNNING
>>>>>>>> Thu Jan 16 17:23:08 2014 [LCM][I]: New VM state is UNKNOWN
>>>>>>>> Thu Jan 16 17:25:41 2014 [VMM][I]: VM found again, state is RUNNING
>>>>>>>> Thu Jan 16 17:25:43 2014 [LCM][I]: New VM state is UNKNOWN
>>>>>>>>
>>>>>>>> --
>>>>>>>> Gerry O'Brien
>>>>>>>>
>>>>>>>> Systems Manager
>>>>>>>> School of Computer Science and Statistics
>>>>>>>> Trinity College Dublin
>>>>>>>> Dublin 2
>>>>>>>> IRELAND
>>>>>>>>
>>>>>>>> 00 353 1 896 1341
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>>>>>
>>>>>> --
>>>>>> Gerry O'Brien
>>>>>>
>>>>>> Systems Manager
>>>>>> School of Computer Science and Statistics
>>>>>> Trinity College Dublin
>>>>>> Dublin 2
>>>>>> IRELAND
>>>>>>
>>>>>> 00 353 1 896 1341
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> [email protected]
>>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Javier Fontán Muiños
>>>>> Developer
>>>>> OpenNebula - The Open Source Toolkit for Data Center Virtualization
>>>>> www.OpenNebula.org | @OpenNebula | github.com/jfontan
>>>>
>>>>
>>>>
>>>
>>> --
>>> Gerry O'Brien
>>>
>>> Systems Manager
>>> School of Computer Science and Statistics
>>> Trinity College Dublin
>>> Dublin 2
>>> IRELAND
>>>
>>> 00 353 1 896 1341
>>>
>>
>>
>
>
> --
> Gerry O'Brien
>
> Systems Manager
> School of Computer Science and Statistics
> Trinity College Dublin
> Dublin 2
> IRELAND
>
> 00 353 1 896 1341
>



-- 
Javier Fontán Muiños
Developer
OpenNebula - The Open Source Toolkit for Data Center Virtualization
www.OpenNebula.org | @OpenNebula | github.com/jfontan
_______________________________________________
Users mailing list
[email protected]
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Reply via email to