On Thu, Apr 14, 2016 at 6:53 PM, Richard Neuboeck <h...@tbi.univie.ac.at> wrote: > On 14.04.16 18:46, Simone Tiraboschi wrote: >> On Thu, Apr 14, 2016 at 4:04 PM, Richard Neuboeck <h...@tbi.univie.ac.at> >> wrote: >>> On 04/14/2016 02:14 PM, Simone Tiraboschi wrote: >>>> On Thu, Apr 14, 2016 at 12:51 PM, Richard Neuboeck >>>> <h...@tbi.univie.ac.at> wrote: >>>>> On 04/13/2016 10:00 AM, Simone Tiraboschi wrote: >>>>>> On Wed, Apr 13, 2016 at 9:38 AM, Richard Neuboeck >>>>>> <h...@tbi.univie.ac.at> wrote: >>>>>>> The answers file shows the setup time of both machines. >>>>>>> >>>>>>> On both machines hosted-engine.conf got rotated right before I wrote >>>>>>> this mail. Is it possible that I managed to interrupt the rotation with >>>>>>> the reboot so the backup was accurate but the update not yet written to >>>>>>> hosted-engine.conf? >>>>>> >>>>>> AFAIK we don't have any rotation mechanism for that file; something >>>>>> else you have in place on that host? >>>>> >>>>> Those machines are all CentOS 7.2 minimal installs. The only >>>>> adaptation I do is installing vim, removing postfix and installing >>>>> exim, removing firewalld and installing iptables-service. Then I add >>>>> the oVirt repos (3.6 and 3.6-snapshot) and deploy the host. >>>>> >>>>> But checking lsof shows that 'ovirt-ha-agent --no-daemon' has access >>>>> to the config file (and the one ending with ~): >>>>> >>>>> # lsof | grep 'hosted-engine.conf~' >>>>> ovirt-ha- 193446 vdsm 351u REG >>>>> 253,0 1021 135070683 >>>>> /etc/ovirt-hosted-engine/hosted-engine.conf~ >>>> >>>> This is not that much relevant if the file was renamed after >>>> ovirt-ha-agent opened it. >>>> Try this: >>>> >>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# tail -n1 -f >>>> /etc/ovirt-hosted-engine/hosted-engine.conf & >>>> [1] 28866 >>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# port= >>>> >>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# lsof | grep >>>> hosted-engine.conf >>>> tail 28866 root 3r REG >>>> 253,0 1014 1595898 /etc/ovirt-hosted-engine/hosted-engine.conf >>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# mv >>>> /etc/ovirt-hosted-engine/hosted-engine.conf >>>> /etc/ovirt-hosted-engine/hosted-engine.conf_123 >>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# lsof | grep >>>> hosted-engine.conf >>>> tail 28866 root 3r REG >>>> 253,0 1014 1595898 >>>> /etc/ovirt-hosted-engine/hosted-engine.conf_123 >>>> [root@c72he20160405h1 ovirt-hosted-engine-setup]# >>>> >>> >>> I've issued the commands you suggested but I don't know how that >>> helps to find the process accessing the config files. >>> >>> After moving the hosted-engine.conf file the HA agent crashed >>> logging the information that the config file is not available. >>> >>> Here is the output from every command: >>> >>> # tail -n1 -f /etc/ovirt-hosted-engine/hosted-engine.conf & >>> [1] 167865 >>> [root@cube-two ~]# port= >>> # lsof | grep hosted-engine.conf >>> ovirt-ha- 166609 vdsm 5u REG >>> 253,0 1021 134433491 >>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) >>> ovirt-ha- 166609 vdsm 7u REG >>> 253,0 1021 134433453 >>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) >>> ovirt-ha- 166609 vdsm 8u REG >>> 253,0 1021 134433489 >>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) >>> ovirt-ha- 166609 vdsm 9u REG >>> 253,0 1021 134433493 >>> /etc/ovirt-hosted-engine/hosted-engine.conf~ >>> ovirt-ha- 166609 vdsm 10u REG >>> 253,0 1021 134433495 >>> /etc/ovirt-hosted-engine/hosted-engine.conf >>> tail 167865 root 3r REG >>> 253,0 1021 134433493 >>> /etc/ovirt-hosted-engine/hosted-engine.conf~ >>> # mv /etc/ovirt-hosted-engine/hosted-engine.conf >>> /etc/ovirt-hosted-engine/hosted-engine.conf_123 >>> # lsof | grep hosted-engine.conf >>> ovirt-ha- 166609 vdsm 5u REG >>> 253,0 1021 134433491 >>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) >>> ovirt-ha- 166609 vdsm 7u REG >>> 253,0 1021 134433453 >>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) >>> ovirt-ha- 166609 vdsm 8u REG >>> 253,0 1021 134433489 >>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) >>> ovirt-ha- 166609 vdsm 9u REG >>> 253,0 1021 134433493 >>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) >>> ovirt-ha- 166609 vdsm 10u REG >>> 253,0 1021 134433495 >>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) >>> ovirt-ha- 166609 vdsm 12u REG >>> 253,0 1021 134433498 >>> /etc/ovirt-hosted-engine/hosted-engine.conf~ >>> ovirt-ha- 166609 vdsm 13u REG >>> 253,0 1021 134433499 >>> /etc/ovirt-hosted-engine/hosted-engine.conf_123 >>> tail 167865 root 3r REG >>> 253,0 1021 134433493 >>> /etc/ovirt-hosted-engine/hosted-engine.conf (deleted) >>> >>> >>>> The issue is understanding who renames that file on your host. >>> >>> From what I've seen so far it looks like a child of vdsm accesses >>> /etc/ovirt-hosted-engine/hosted-engine.conf periodically but is not >>> responsible for the ~ file. >>> >>> # auditctl -w /etc/ovirt-hosted-engine/hosted-engine.conf >>> and >>> # auditctl -w /etc/ovirt-hosted-engine/hosted-engine.conf~ >>> >>> auditd.log shows this: >>> >>> type=SYSCALL msg=audit(1460639783.613:482590): arch=c000003e >>> syscall=2 success=yes exit=75 a0=7f29b400f0b0 a1=0 a2=1b6 a3=24 >>> items=1 ppid=1 pid=3701 auid=4294967295 uid=36 gid=36 euid=36 >>> suid=36 fsuid=36 egid=36 sgid=36 fsgid=36 tty=(none) ses=4294967295 >>> comm="jsonrpc.Executo" exe="/usr/bin/python2.7" >>> subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 key=(null) >>> type=CWD msg=audit(1460639783.613:482590): cwd="/" >>> type=PATH msg=audit(1460639783.613:482590): item=0 >>> name="/etc/ovirt-hosted-engine/hosted-engine.conf" inode=134433499 >>> dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00 >>> obj=system_u:object_r:etc_t:s0 objtype=NORMAL >>> >>> >>> Now that the HA agent is dead I'm removing the ~ file and starting >>> the HA agent again. The ~ file immediately appears again. >>> >>> # rm hosted-engine.conf~ >>> rm: remove regular file ‘hosted-engine.conf~’? y >>> [root@cube-two ovirt-hosted-engine]# ls -l >>> total 6800 >>> -rw-r--r--. 1 root root 3252 Apr 8 10:35 answers.conf >>> -rw-r--r--. 1 root root 6948582 Apr 14 14:48 ha-trace.log >>> -rw-r--r--. 1 root root 1021 Apr 14 15:07 hosted-engine.conf >>> -rw-r--r--. 1 root root 413 Apr 8 10:35 iptables.example >>> [root@cube-two ovirt-hosted-engine]# systemctl start ovirt-ha-agent >>> [root@cube-two ovirt-hosted-engine]# ls -l >>> total 6804 >>> -rw-r--r--. 1 root root 3252 Apr 8 10:35 answers.conf >>> -rw-r--r--. 1 root root 6948582 Apr 14 14:48 ha-trace.log >>> -rw-r--r--. 1 root root 1021 Apr 14 15:18 hosted-engine.conf >>> -rw-r--r--. 1 root root 1021 Apr 14 15:07 hosted-engine.conf~ >>> -rw-r--r--. 1 root root 413 Apr 8 10:35 iptables.example >>> >>> The auditd.log shows that ~ file is moved into place but not what >>> issued the mv: >>> >>> type=CONFIG_CHANGE msg=audit(1460639919.277:482750): auid=4294967295 >>> ses=4294967295 op="updated_rules" >>> path="/etc/ovirt-hosted-engine/hosted-engine.conf~" key=(null) >>> list=4 res=1 >>> type=SYSCALL msg=audit(1460639919.277:482751): arch=c000003e >>> syscall=82 success=yes exit=0 a0=7ffe4b3c0e90 a1=7ffe4b3bf920 >>> a2=7f68083a2778 a3=7ffe4b3bf680 items=5 ppid=170233 pid=170234 >>> auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 eg >>> id=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mv" >>> exe="/usr/bin/mv" subj=system_u:system_r:unconfined_service_t:s0 >>> key=(null) >>> type=CWD msg=audit(1460639919.277:482751): cwd="/" >>> type=PATH msg=audit(1460639919.277:482751): item=0 >>> name="/etc/ovirt-hosted-engine/" inode=69555 dev=fd:00 mode=040755 >>> ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 objtype=PARENT >>> type=PATH msg=audit(1460639919.277:482751): item=1 >>> name="/etc/ovirt-hosted-engine/" inode=69555 dev=fd:00 mode=040755 >>> ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 objtype=PARENT >>> type=PATH msg=audit(1460639919.277:482751): item=2 >>> name="/etc/ovirt-hosted-engine/hosted-engine.conf" inode=134433453 >>> dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00 >>> obj=system_u:object_r:etc_t:s0 objtype=DELETE >>> type=PATH msg=audit(1460639919.277:482751): item=3 >>> name="/etc/ovirt-hosted-engine/hosted-engine.conf~" inode=134433499 >>> dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00 >>> obj=system_u:object_r:etc_t:s0 objtype=DELETE >>> type=PATH msg=audit(1460639919.277:482751): item=4 >>> name="/etc/ovirt-hosted-engine/hosted-engine.conf~" inode=134433453 >>> dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00 >>> obj=system_u:object_r:etc_t:s0 objtype=CREATE >>> >>> >>>> As a thumb rule, if a file name is appended with a tilde~, it only >>>> means that it is a backup created by a text editor or similar program. >>> >>> If anyone except myself would have access to these systems I would >>> guess the same. But since I'm not editing anything in >>> /etc/ovirt-hosted-engine there must be another reason. And there is. >>> >>> Aside from auditd I tried to strace the whole thing just to make >>> sure it comes from the HA agent. >>> >>> [root@cube-two ~]# strace -o ha-trace.log -f >>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon >>> >>> Looking at the trace log I found this: >>> >>> 183409 statfs("/etc/ovirt-hosted-engine/.", {f_type=0x58465342, >>> f_bsize=4096, f_blocks=13100800, f_bfree=12523576, >>> f_bavail=12523576, f_files=52428800, f_ffree=52379892, >>> f_fsid={64768, 0}, f_namelen=255, f_frsize=4096}) = 0 >>> 183409 rename("/etc/ovirt-hosted-engine/hosted-engine.conf", >>> "/etc/ovirt-hosted-engine/hosted-engine.conf~") = 0 >>> 183409 rename("/var/lib/ovirt-hosted-engine-ha/tmpNjTElr", >>> "/etc/ovirt-hosted-engine/hosted-engine.conf") = 0 >>> 183409 newfstatat(AT_FDCWD, >>> "/etc/ovirt-hosted-engine/hosted-engine.conf", >>> {st_mode=S_IFREG|0600, st_size=1021, ...}, AT_SYMLINK_NOFOLLOW) = 0 >>> 183409 open("/etc/ovirt-hosted-engine/hosted-engine.conf", >>> O_RDONLY|O_NOFOLLOW) = 3 >>> >>> >>> Putting it all together I started reading the HA agent sources and >>> found the function _wrote_updated_conf_file in >>> /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/upgrade.py >>> which issues a mv -b which creates the ~ file. >> >> This should just trigger during 3.5 to 3.6 upgrade but your host are new. >> Can you please attach /var/log/ovirt-hosted-engine-ha/agent.log from >> one of them? > > The agent.log of host cube-two is attached to this mail.
Yes, you are right: it's looping trying to fix a path in the config file (on 3.5 we didn't check if an NFS path was ending with a '/' while for other reasons it wasn't working on 3.6 and so we need to fix it) but its doesn't seams you case and so the strange loop. Now I need to understand why it enters there. Can you please execute tree /rhev/data-center/ and post me the output? Thanks again >>> The question now is why is this done so frequently. Especially >>> considering since there are no modifications to the file. Is this >>> behavior normal? >>> >>> [root@cube-two ~]# diff /etc/ovirt-hosted-engine/hosted-engine.conf* >>> [root@cube-two ~]# >>> >>> >>>>>>> [root@cube-two ~]# ls -l /etc/ovirt-hosted-engine >>>>>>> total 16 >>>>>>> -rw-r--r--. 1 root root 3252 Apr 8 10:35 answers.conf >>>>>>> -rw-r--r--. 1 root root 1021 Apr 13 09:31 hosted-engine.conf >>>>>>> -rw-r--r--. 1 root root 1021 Apr 13 09:30 hosted-engine.conf~ >>>>>>> >>>>>>> [root@cube-three ~]# ls -l /etc/ovirt-hosted-engine >>>>>>> total 16 >>>>>>> -rw-r--r--. 1 root root 3233 Apr 11 08:02 answers.conf >>>>>>> -rw-r--r--. 1 root root 1002 Apr 13 09:31 hosted-engine.conf >>>>>>> -rw-r--r--. 1 root root 1002 Apr 13 09:31 hosted-engine.conf~ >>>>>>> >>>>>>> On 12.04.16 16:01, Simone Tiraboschi wrote: >>>>>>>> Everything seams fine here, >>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf seams to be correctly >>>>>>>> created with the right name. >>>>>>>> Can you please check the latest modification time of your >>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf~ and compare it with the >>>>>>>> setup time? >>>>>>>> >>>>>>>> On Tue, Apr 12, 2016 at 2:34 PM, Richard Neuboeck >>>>>>>> <h...@tbi.univie.ac.at> wrote: >>>>>>>>> On 04/12/2016 11:32 AM, Simone Tiraboschi wrote: >>>>>>>>>> On Mon, Apr 11, 2016 at 8:11 AM, Richard Neuboeck >>>>>>>>>> <h...@tbi.univie.ac.at> wrote: >>>>>>>>>>> Hi oVirt Group, >>>>>>>>>>> >>>>>>>>>>> in my attempts to get all aspects of oVirt 3.6 up and running I >>>>>>>>>>> stumbled upon something I'm not sure how to fix: >>>>>>>>>>> >>>>>>>>>>> Initially I installed a hosted engine setup. After that I added >>>>>>>>>>> another HA host (with hosted-engine --deploy). The host was >>>>>>>>>>> registered in the Engine correctly and HA agent came up as expected. >>>>>>>>>>> >>>>>>>>>>> However if I reboot the second host (through the Engine UI or >>>>>>>>>>> manually) HA agent fails to start. The reason seems to be that >>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf is empty. The backup >>>>>>>>>>> file ending with ~ exists though. >>>>>>>>>> >>>>>>>>>> Can you please attach hosted-engine-setup logs from your additional >>>>>>>>>> hosts? >>>>>>>>>> AFAIK our code will never take a ~ ending backup of that file. >>>>>>>>> >>>>>>>>> ovirt-hosted-engine-setup logs from both additional hosts are >>>>>>>>> attached to this mail. >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Here are the log messages from the journal: >>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at systemd[1]: Starting oVirt >>>>>>>>>>> Hosted Engine High Availability Monitoring Agent... >>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at ovirt-ha-agent[3747]: >>>>>>>>>>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:ovirt-hosted-engine-ha >>>>>>>>>>> agent 1.3.5.3-0.0.master started >>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at ovirt-ha-agent[3747]: >>>>>>>>>>> INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found >>>>>>>>>>> certificate common name: cube-two.tbi.univie.ac.at >>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at ovirt-ha-agent[3747]: >>>>>>>>>>> ovirt-ha-agent >>>>>>>>>>> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Hosted >>>>>>>>>>> Engine is not configured. Shutting down. >>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at ovirt-ha-agent[3747]: >>>>>>>>>>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Hosted >>>>>>>>>>> Engine is not configured. Shutting down. >>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at ovirt-ha-agent[3747]: >>>>>>>>>>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down >>>>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at systemd[1]: >>>>>>>>>>> ovirt-ha-agent.service: main process exited, code=exited, >>>>>>>>>>> status=255/n/a >>>>>>>>>>> >>>>>>>>>>> If I restore the configuration from the backup file and manually >>>>>>>>>>> restart the HA agent it's working properly. >>>>>>>>>>> >>>>>>>>>>> For testing purposes I added a third HA host which turn out to >>>>>>>>>>> behave exactly the same. >>>>>>>>>>> >>>>>>>>>>> Any help would be appreciated! >>>>>>>>>>> Thanks >>>>>>>>>>> Cheers >>>>>>>>>>> Richard >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> /dev/null >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Users mailing list >>>>>>>>>>> Users@ovirt.org >>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> /dev/null >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list >>>>>>> Users@ovirt.org >>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>> >>>>> >>>>> >>>>> -- >>>>> /dev/null >>>>> >>> >>> >>> -- >>> /dev/null >>> > _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users