On 04/14/2016 02:14 PM, Simone Tiraboschi wrote: > On Thu, Apr 14, 2016 at 12:51 PM, Richard Neuboeck > <[email protected]> wrote: >> On 04/13/2016 10:00 AM, Simone Tiraboschi wrote: >>> On Wed, Apr 13, 2016 at 9:38 AM, Richard Neuboeck <[email protected]> >>> wrote: >>>> The answers file shows the setup time of both machines. >>>> >>>> On both machines hosted-engine.conf got rotated right before I wrote >>>> this mail. Is it possible that I managed to interrupt the rotation with >>>> the reboot so the backup was accurate but the update not yet written to >>>> hosted-engine.conf? >>> >>> AFAIK we don't have any rotation mechanism for that file; something >>> else you have in place on that host? >> >> Those machines are all CentOS 7.2 minimal installs. The only >> adaptation I do is installing vim, removing postfix and installing >> exim, removing firewalld and installing iptables-service. Then I add >> the oVirt repos (3.6 and 3.6-snapshot) and deploy the host. >> >> But checking lsof shows that 'ovirt-ha-agent --no-daemon' has access >> to the config file (and the one ending with ~): >> >> # lsof | grep 'hosted-engine.conf~' >> ovirt-ha- 193446 vdsm 351u REG >> 253,0 1021 135070683 >> /etc/ovirt-hosted-engine/hosted-engine.conf~ > > This is not that much relevant if the file was renamed after > ovirt-ha-agent opened it. > Try this: > > [root@c72he20160405h1 ovirt-hosted-engine-setup]# tail -n1 -f > /etc/ovirt-hosted-engine/hosted-engine.conf & > [1] 28866 > [root@c72he20160405h1 ovirt-hosted-engine-setup]# port= > > [root@c72he20160405h1 ovirt-hosted-engine-setup]# lsof | grep > hosted-engine.conf > tail 28866 root 3r REG > 253,0 1014 1595898 /etc/ovirt-hosted-engine/hosted-engine.conf > [root@c72he20160405h1 ovirt-hosted-engine-setup]# mv > /etc/ovirt-hosted-engine/hosted-engine.conf > /etc/ovirt-hosted-engine/hosted-engine.conf_123 > [root@c72he20160405h1 ovirt-hosted-engine-setup]# lsof | grep > hosted-engine.conf > tail 28866 root 3r REG > 253,0 1014 1595898 > /etc/ovirt-hosted-engine/hosted-engine.conf_123 > [root@c72he20160405h1 ovirt-hosted-engine-setup]# >
I've issued the commands you suggested but I don't know how that
helps to find the process accessing the config files.
After moving the hosted-engine.conf file the HA agent crashed
logging the information that the config file is not available.
Here is the output from every command:
# tail -n1 -f /etc/ovirt-hosted-engine/hosted-engine.conf &
[1] 167865
[root@cube-two ~]# port=
# lsof | grep hosted-engine.conf
ovirt-ha- 166609 vdsm 5u REG
253,0 1021 134433491
/etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
ovirt-ha- 166609 vdsm 7u REG
253,0 1021 134433453
/etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
ovirt-ha- 166609 vdsm 8u REG
253,0 1021 134433489
/etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
ovirt-ha- 166609 vdsm 9u REG
253,0 1021 134433493
/etc/ovirt-hosted-engine/hosted-engine.conf~
ovirt-ha- 166609 vdsm 10u REG
253,0 1021 134433495
/etc/ovirt-hosted-engine/hosted-engine.conf
tail 167865 root 3r REG
253,0 1021 134433493
/etc/ovirt-hosted-engine/hosted-engine.conf~
# mv /etc/ovirt-hosted-engine/hosted-engine.conf
/etc/ovirt-hosted-engine/hosted-engine.conf_123
# lsof | grep hosted-engine.conf
ovirt-ha- 166609 vdsm 5u REG
253,0 1021 134433491
/etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
ovirt-ha- 166609 vdsm 7u REG
253,0 1021 134433453
/etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
ovirt-ha- 166609 vdsm 8u REG
253,0 1021 134433489
/etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
ovirt-ha- 166609 vdsm 9u REG
253,0 1021 134433493
/etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
ovirt-ha- 166609 vdsm 10u REG
253,0 1021 134433495
/etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
ovirt-ha- 166609 vdsm 12u REG
253,0 1021 134433498
/etc/ovirt-hosted-engine/hosted-engine.conf~
ovirt-ha- 166609 vdsm 13u REG
253,0 1021 134433499
/etc/ovirt-hosted-engine/hosted-engine.conf_123
tail 167865 root 3r REG
253,0 1021 134433493
/etc/ovirt-hosted-engine/hosted-engine.conf (deleted)
> The issue is understanding who renames that file on your host.
From what I've seen so far it looks like a child of vdsm accesses
/etc/ovirt-hosted-engine/hosted-engine.conf periodically but is not
responsible for the ~ file.
# auditctl -w /etc/ovirt-hosted-engine/hosted-engine.conf
and
# auditctl -w /etc/ovirt-hosted-engine/hosted-engine.conf~
auditd.log shows this:
type=SYSCALL msg=audit(1460639783.613:482590): arch=c000003e
syscall=2 success=yes exit=75 a0=7f29b400f0b0 a1=0 a2=1b6 a3=24
items=1 ppid=1 pid=3701 auid=4294967295 uid=36 gid=36 euid=36
suid=36 fsuid=36 egid=36 sgid=36 fsgid=36 tty=(none) ses=4294967295
comm="jsonrpc.Executo" exe="/usr/bin/python2.7"
subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 key=(null)
type=CWD msg=audit(1460639783.613:482590): cwd="/"
type=PATH msg=audit(1460639783.613:482590): item=0
name="/etc/ovirt-hosted-engine/hosted-engine.conf" inode=134433499
dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00
obj=system_u:object_r:etc_t:s0 objtype=NORMAL
Now that the HA agent is dead I'm removing the ~ file and starting
the HA agent again. The ~ file immediately appears again.
# rm hosted-engine.conf~
rm: remove regular file ‘hosted-engine.conf~’? y
[root@cube-two ovirt-hosted-engine]# ls -l
total 6800
-rw-r--r--. 1 root root 3252 Apr 8 10:35 answers.conf
-rw-r--r--. 1 root root 6948582 Apr 14 14:48 ha-trace.log
-rw-r--r--. 1 root root 1021 Apr 14 15:07 hosted-engine.conf
-rw-r--r--. 1 root root 413 Apr 8 10:35 iptables.example
[root@cube-two ovirt-hosted-engine]# systemctl start ovirt-ha-agent
[root@cube-two ovirt-hosted-engine]# ls -l
total 6804
-rw-r--r--. 1 root root 3252 Apr 8 10:35 answers.conf
-rw-r--r--. 1 root root 6948582 Apr 14 14:48 ha-trace.log
-rw-r--r--. 1 root root 1021 Apr 14 15:18 hosted-engine.conf
-rw-r--r--. 1 root root 1021 Apr 14 15:07 hosted-engine.conf~
-rw-r--r--. 1 root root 413 Apr 8 10:35 iptables.example
The auditd.log shows that ~ file is moved into place but not what
issued the mv:
type=CONFIG_CHANGE msg=audit(1460639919.277:482750): auid=4294967295
ses=4294967295 op="updated_rules"
path="/etc/ovirt-hosted-engine/hosted-engine.conf~" key=(null)
list=4 res=1
type=SYSCALL msg=audit(1460639919.277:482751): arch=c000003e
syscall=82 success=yes exit=0 a0=7ffe4b3c0e90 a1=7ffe4b3bf920
a2=7f68083a2778 a3=7ffe4b3bf680 items=5 ppid=170233 pid=170234
auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 eg
id=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mv"
exe="/usr/bin/mv" subj=system_u:system_r:unconfined_service_t:s0
key=(null)
type=CWD msg=audit(1460639919.277:482751): cwd="/"
type=PATH msg=audit(1460639919.277:482751): item=0
name="/etc/ovirt-hosted-engine/" inode=69555 dev=fd:00 mode=040755
ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 objtype=PARENT
type=PATH msg=audit(1460639919.277:482751): item=1
name="/etc/ovirt-hosted-engine/" inode=69555 dev=fd:00 mode=040755
ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:etc_t:s0 objtype=PARENT
type=PATH msg=audit(1460639919.277:482751): item=2
name="/etc/ovirt-hosted-engine/hosted-engine.conf" inode=134433453
dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00
obj=system_u:object_r:etc_t:s0 objtype=DELETE
type=PATH msg=audit(1460639919.277:482751): item=3
name="/etc/ovirt-hosted-engine/hosted-engine.conf~" inode=134433499
dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00
obj=system_u:object_r:etc_t:s0 objtype=DELETE
type=PATH msg=audit(1460639919.277:482751): item=4
name="/etc/ovirt-hosted-engine/hosted-engine.conf~" inode=134433453
dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00
obj=system_u:object_r:etc_t:s0 objtype=CREATE
> As a thumb rule, if a file name is appended with a tilde~, it only
> means that it is a backup created by a text editor or similar program.
If anyone except myself would have access to these systems I would
guess the same. But since I'm not editing anything in
/etc/ovirt-hosted-engine there must be another reason. And there is.
Aside from auditd I tried to strace the whole thing just to make
sure it comes from the HA agent.
[root@cube-two ~]# strace -o ha-trace.log -f
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
Looking at the trace log I found this:
183409 statfs("/etc/ovirt-hosted-engine/.", {f_type=0x58465342,
f_bsize=4096, f_blocks=13100800, f_bfree=12523576,
f_bavail=12523576, f_files=52428800, f_ffree=52379892,
f_fsid={64768, 0}, f_namelen=255, f_frsize=4096}) = 0
183409 rename("/etc/ovirt-hosted-engine/hosted-engine.conf",
"/etc/ovirt-hosted-engine/hosted-engine.conf~") = 0
183409 rename("/var/lib/ovirt-hosted-engine-ha/tmpNjTElr",
"/etc/ovirt-hosted-engine/hosted-engine.conf") = 0
183409 newfstatat(AT_FDCWD,
"/etc/ovirt-hosted-engine/hosted-engine.conf",
{st_mode=S_IFREG|0600, st_size=1021, ...}, AT_SYMLINK_NOFOLLOW) = 0
183409 open("/etc/ovirt-hosted-engine/hosted-engine.conf",
O_RDONLY|O_NOFOLLOW) = 3
Putting it all together I started reading the HA agent sources and
found the function _wrote_updated_conf_file in
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/upgrade.py
which issues a mv -b which creates the ~ file.
The question now is why is this done so frequently. Especially
considering since there are no modifications to the file. Is this
behavior normal?
[root@cube-two ~]# diff /etc/ovirt-hosted-engine/hosted-engine.conf*
[root@cube-two ~]#
>>>> [root@cube-two ~]# ls -l /etc/ovirt-hosted-engine
>>>> total 16
>>>> -rw-r--r--. 1 root root 3252 Apr 8 10:35 answers.conf
>>>> -rw-r--r--. 1 root root 1021 Apr 13 09:31 hosted-engine.conf
>>>> -rw-r--r--. 1 root root 1021 Apr 13 09:30 hosted-engine.conf~
>>>>
>>>> [root@cube-three ~]# ls -l /etc/ovirt-hosted-engine
>>>> total 16
>>>> -rw-r--r--. 1 root root 3233 Apr 11 08:02 answers.conf
>>>> -rw-r--r--. 1 root root 1002 Apr 13 09:31 hosted-engine.conf
>>>> -rw-r--r--. 1 root root 1002 Apr 13 09:31 hosted-engine.conf~
>>>>
>>>> On 12.04.16 16:01, Simone Tiraboschi wrote:
>>>>> Everything seams fine here,
>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf seams to be correctly
>>>>> created with the right name.
>>>>> Can you please check the latest modification time of your
>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf~ and compare it with the
>>>>> setup time?
>>>>>
>>>>> On Tue, Apr 12, 2016 at 2:34 PM, Richard Neuboeck <[email protected]>
>>>>> wrote:
>>>>>> On 04/12/2016 11:32 AM, Simone Tiraboschi wrote:
>>>>>>> On Mon, Apr 11, 2016 at 8:11 AM, Richard Neuboeck
>>>>>>> <[email protected]> wrote:
>>>>>>>> Hi oVirt Group,
>>>>>>>>
>>>>>>>> in my attempts to get all aspects of oVirt 3.6 up and running I
>>>>>>>> stumbled upon something I'm not sure how to fix:
>>>>>>>>
>>>>>>>> Initially I installed a hosted engine setup. After that I added
>>>>>>>> another HA host (with hosted-engine --deploy). The host was
>>>>>>>> registered in the Engine correctly and HA agent came up as expected.
>>>>>>>>
>>>>>>>> However if I reboot the second host (through the Engine UI or
>>>>>>>> manually) HA agent fails to start. The reason seems to be that
>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf is empty. The backup
>>>>>>>> file ending with ~ exists though.
>>>>>>>
>>>>>>> Can you please attach hosted-engine-setup logs from your additional
>>>>>>> hosts?
>>>>>>> AFAIK our code will never take a ~ ending backup of that file.
>>>>>>
>>>>>> ovirt-hosted-engine-setup logs from both additional hosts are
>>>>>> attached to this mail.
>>>>>>
>>>>>>>
>>>>>>>> Here are the log messages from the journal:
>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at systemd[1]: Starting oVirt
>>>>>>>> Hosted Engine High Availability Monitoring Agent...
>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at ovirt-ha-agent[3747]:
>>>>>>>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:ovirt-hosted-engine-ha
>>>>>>>> agent 1.3.5.3-0.0.master started
>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at ovirt-ha-agent[3747]:
>>>>>>>> INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found
>>>>>>>> certificate common name: cube-two.tbi.univie.ac.at
>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at ovirt-ha-agent[3747]:
>>>>>>>> ovirt-ha-agent
>>>>>>>> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Hosted
>>>>>>>> Engine is not configured. Shutting down.
>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at ovirt-ha-agent[3747]:
>>>>>>>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Hosted
>>>>>>>> Engine is not configured. Shutting down.
>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at ovirt-ha-agent[3747]:
>>>>>>>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
>>>>>>>> Apr 11 07:29:39 cube-two.tbi.univie.ac.at systemd[1]:
>>>>>>>> ovirt-ha-agent.service: main process exited, code=exited,
>>>>>>>> status=255/n/a
>>>>>>>>
>>>>>>>> If I restore the configuration from the backup file and manually
>>>>>>>> restart the HA agent it's working properly.
>>>>>>>>
>>>>>>>> For testing purposes I added a third HA host which turn out to
>>>>>>>> behave exactly the same.
>>>>>>>>
>>>>>>>> Any help would be appreciated!
>>>>>>>> Thanks
>>>>>>>> Cheers
>>>>>>>> Richard
>>>>>>>>
>>>>>>>> --
>>>>>>>> /dev/null
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> /dev/null
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> [email protected]
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>
>>
>> --
>> /dev/null
>>
--
/dev/null
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

