Re: [ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

2017-12-13 Thread Luca 'remix_tj' Lorenzetto
Il 13 dic 2017 8:19 PM, "Yaniv Kaul"  ha scritto:



On Wed, Dec 13, 2017 at 4:15 PM, Luca 'remix_tj' Lorenzetto <
lorenzetto.l...@gmail.com> wrote:

> Hello,
>
> Today i started troubleshooting more in depth on dns requests and exactly
> while i was looking at tcpdump an event of EngineUp -> EngineBadHealth
> happened.
>
> Looking at the dns requests i see this:
>
> [...]
> 14:30:35.909201 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54)
> 14:30:35.909215 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 6242+ ? engine01.intranet.company.it. (54)
> 14:30:40.914285 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54)
> 14:30:40.914316 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 6242+ ? engine01.intranet.company.it. (54)
> 14:30:45.918306 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 60263+ A? engine01.intranet.company.it.i
> ntranet.company.it. (74)
> 14:30:45.918329 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 18681+ ? engine01.intranet.company.it.i
> ntranet.company.it. (74)
> 14:30:50.920376 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 60263+ A? engine01.intranet.company.it.i
> ntranet.company.it. (74)
> 14:30:50.920411 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 18681+ ? engine01.intranet.company.it.i
> ntranet.company.it. (74)
> 14:30:56.044242 <0442%2042> IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54)
> 14:30:56.044267 <0442%2067> IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 29680+ ? engine01.intranet.company.it. (54)
> 14:31:01.049761 <049%20761> IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54)
> 14:31:01.049777 <049%20777> IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 29680+ ? engine01.intranet.company.it. (54)
> 14:31:06.052635 <06%20052635> IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 24807+ A? engine01.intranet.company.it.i
> ntranet.company.it. (74)
> 14:31:06.052649 <06%20052649> IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 53745+ ? engine01.intranet.company.it.i
> ntranet.company.it. (74)
> 14:31:11.057724 <0577%2024> IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 24807+ A? engine01.intranet.company.it.i
> ntranet.company.it. (74)
> 14:31:11.057745 <0577%2045> IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 53745+ ? engine01.intranet.company.it.i
> ntranet.company.it. (74)
> 14:31:16.175204 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54)
> 14:31:16.175225 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 15726+ ? engine01.intranet.company.it. (54)
> 14:31:19.670746 IP kvmhost01.intranet.company.it.54689 >
> dns.company.it.53: 40999+ A? kvmsvilca01.intranet.company.it. (49)
> 14:31:21.180295 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54)
> 14:31:21.180337 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 15726+ ? engine01.intranet.company.it. (54)
> 14:31:23.771959 IP kvmhost01.intranet.company.it.53741 >
> dns.company.it.53: 1707+ A? internalmx.intranet.company.it. (48)
> [...]
>
> The last dns requests has success and gets the MX address and immediately
> after i get the email reporting the status change.
>

Can you ensure it doesn't have multiple IPs registered for it in DNS?
dig or so should help.
Y.



No, it's not. A single ip is registered. It's for sure a dns query missing
its replies.

I'm debugging with network team on what's happening.

Anyway, i think that Broker log in debug Mode should help identifying the
source if this errors.
Maybe explaining better why liveness check has failed will reduce the
troubleshooting experiments.

Luca



> This is clearly an issue with name resolution, but that's not clear to me
> from the broker.log file. The only message about it that i get is:
>
> Thread-16::DEBUG::2017-12-13 14:31:23,657::monitor::126::ov
> irt_hosted_engine_ha.broker.monitor.Monitor::(get_value) Submonitor
> engine-health id 139653
> 412040592 current value: {"reason": "failed liveliness check", "health":
> "bad", "vm": "up", "detail": "up"}
> Thread-16::DEBUG::2017-12-13 14:31:23,657::listener::170::o
> virt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
> Response: success {"reaso
> n": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}
>
>
> But around that messages i get no signals of error on dns queries or
> similar. Do i need to check on other log files?
>
> Luca
>
>
> On Mon, Dec 11, 2017 at 3:34 PM, Luca 'remix_tj' Lorenzetto <
> lorenzetto.l...@gmail.com> wrote:
>
>> Hi Martin, Hi all,
>>
>> *some minutes* has passed and i've the piece of

Re: [ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

2017-12-13 Thread Yaniv Kaul
On Wed, Dec 13, 2017 at 4:15 PM, Luca 'remix_tj' Lorenzetto <
lorenzetto.l...@gmail.com> wrote:

> Hello,
>
> Today i started troubleshooting more in depth on dns requests and exactly
> while i was looking at tcpdump an event of EngineUp -> EngineBadHealth
> happened.
>
> Looking at the dns requests i see this:
>
> [...]
> 14:30:35.909201 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54)
> 14:30:35.909215 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 6242+ ? engine01.intranet.company.it. (54)
> 14:30:40.914285 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54)
> 14:30:40.914316 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 6242+ ? engine01.intranet.company.it. (54)
> 14:30:45.918306 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 60263+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:45.918329 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 18681+ ? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:50.920376 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 60263+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:50.920411 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 18681+ ? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:56.044242 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54)
> 14:30:56.044267 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 29680+ ? engine01.intranet.company.it. (54)
> 14:31:01.049761 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54)
> 14:31:01.049777 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 29680+ ? engine01.intranet.company.it. (54)
> 14:31:06.052635 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 24807+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:06.052649 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 53745+ ? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:11.057724 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 24807+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:11.057745 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 53745+ ? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:16.175204 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54)
> 14:31:16.175225 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 15726+ ? engine01.intranet.company.it. (54)
> 14:31:19.670746 IP kvmhost01.intranet.company.it.54689 >
> dns.company.it.53: 40999+ A? kvmsvilca01.intranet.company.it. (49)
> 14:31:21.180295 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54)
> 14:31:21.180337 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 15726+ ? engine01.intranet.company.it. (54)
> 14:31:23.771959 IP kvmhost01.intranet.company.it.53741 >
> dns.company.it.53: 1707+ A? internalmx.intranet.company.it. (48)
> [...]
>
> The last dns requests has success and gets the MX address and immediately
> after i get the email reporting the status change.
>

Can you ensure it doesn't have multiple IPs registered for it in DNS?
dig or so should help.
Y.


>
> This is clearly an issue with name resolution, but that's not clear to me
> from the broker.log file. The only message about it that i get is:
>
> Thread-16::DEBUG::2017-12-13 14:31:23,657::monitor::126::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(get_value) Submonitor
> engine-health id 139653
> 412040592 current value: {"reason": "failed liveliness check", "health":
> "bad", "vm": "up", "detail": "up"}
> Thread-16::DEBUG::2017-12-13 14:31:23,657::listener::170::
> ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
> Response: success {"reaso
> n": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}
>
>
> But around that messages i get no signals of error on dns queries or
> similar. Do i need to check on other log files?
>
> Luca
>
>
> On Mon, Dec 11, 2017 at 3:34 PM, Luca 'remix_tj' Lorenzetto <
> lorenzetto.l...@gmail.com> wrote:
>
>> Hi Martin, Hi all,
>>
>> *some minutes* has passed and i've the piece of log i'm looking at.
>>
>> ​
>>  broker.log-upbadup
>> 
>> ​
>>
>>
>
>
> --
> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
> calcoli che potrebbero essere affidati a chiunque se si usassero delle
> macchine"
> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>
> "Internet è la più grande biblioteca del mondo.
> Ma il problema è che i libri sono tutti sparsi sul pavimento

Re: [ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

2017-12-13 Thread Martin Sivak
Hi,

I am afraid we do not have logs that would go that deep into the stack. DNS
resolution issues will definitely affect both the notification system (if
not using localhost smtp) and the engine status checks (because we use the
fqdn).

Best regards

Martin

On Wed, Dec 13, 2017 at 3:15 PM, Luca 'remix_tj' Lorenzetto <
lorenzetto.l...@gmail.com> wrote:

> Hello,
>
> Today i started troubleshooting more in depth on dns requests and exactly
> while i was looking at tcpdump an event of EngineUp -> EngineBadHealth
> happened.
>
> Looking at the dns requests i see this:
>
> [...]
> 14:30:35.909201 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54)
> 14:30:35.909215 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 6242+ ? engine01.intranet.company.it. (54)
> 14:30:40.914285 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 34102+ A? engine01.intranet.company.it. (54)
> 14:30:40.914316 IP kvmhost01.intranet.company.it.55654 >
> dns.company.it.53: 6242+ ? engine01.intranet.company.it. (54)
> 14:30:45.918306 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 60263+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:45.918329 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 18681+ ? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:50.920376 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 60263+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:50.920411 IP kvmhost01.intranet.company.it.54885 >
> dns.company.it.53: 18681+ ? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:30:56.044242 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54)
> 14:30:56.044267 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 29680+ ? engine01.intranet.company.it. (54)
> 14:31:01.049761 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 28413+ A? engine01.intranet.company.it. (54)
> 14:31:01.049777 IP kvmhost01.intranet.company.it.58319 >
> dns.company.it.53: 29680+ ? engine01.intranet.company.it. (54)
> 14:31:06.052635 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 24807+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:06.052649 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 53745+ ? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:11.057724 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 24807+ A? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:11.057745 IP kvmhost01.intranet.company.it.58093 >
> dns.company.it.53: 53745+ ? engine01.intranet.company.it.
> intranet.company.it. (74)
> 14:31:16.175204 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54)
> 14:31:16.175225 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 15726+ ? engine01.intranet.company.it. (54)
> 14:31:19.670746 IP kvmhost01.intranet.company.it.54689 >
> dns.company.it.53: 40999+ A? kvmsvilca01.intranet.company.it. (49)
> 14:31:21.180295 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 63680+ A? engine01.intranet.company.it. (54)
> 14:31:21.180337 IP kvmhost01.intranet.company.it.44950 >
> dns.company.it.53: 15726+ ? engine01.intranet.company.it. (54)
> 14:31:23.771959 IP kvmhost01.intranet.company.it.53741 >
> dns.company.it.53: 1707+ A? internalmx.intranet.company.it. (48)
> [...]
>
> The last dns requests has success and gets the MX address and immediately
> after i get the email reporting the status change.
>
> This is clearly an issue with name resolution, but that's not clear to me
> from the broker.log file. The only message about it that i get is:
>
> Thread-16::DEBUG::2017-12-13 14:31:23,657::monitor::126::
> ovirt_hosted_engine_ha.broker.monitor.Monitor::(get_value) Submonitor
> engine-health id 139653
> 412040592 current value: {"reason": "failed liveliness check", "health":
> "bad", "vm": "up", "detail": "up"}
> Thread-16::DEBUG::2017-12-13 14:31:23,657::listener::170::
> ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
> Response: success {"reaso
> n": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}
>
>
> But around that messages i get no signals of error on dns queries or
> similar. Do i need to check on other log files?
>
> Luca
>
>
> On Mon, Dec 11, 2017 at 3:34 PM, Luca 'remix_tj' Lorenzetto <
> lorenzetto.l...@gmail.com> wrote:
>
>> Hi Martin, Hi all,
>>
>> *some minutes* has passed and i've the piece of log i'm looking at.
>>
>> ​
>>  broker.log-upbadup
>> 
>> ​
>>
>>
>
>
> --
> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
> calcoli che potrebbero essere affidati a chiunque se si usassero delle
> macchine"
> Gottfried Wilhelm von

Re: [ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

2017-12-13 Thread Luca 'remix_tj' Lorenzetto
Hello,

Today i started troubleshooting more in depth on dns requests and exactly
while i was looking at tcpdump an event of EngineUp -> EngineBadHealth
happened.

Looking at the dns requests i see this:

[...]
14:30:35.909201 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53:
34102+ A? engine01.intranet.company.it. (54)
14:30:35.909215 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53:
6242+ ? engine01.intranet.company.it. (54)
14:30:40.914285 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53:
34102+ A? engine01.intranet.company.it. (54)
14:30:40.914316 IP kvmhost01.intranet.company.it.55654 > dns.company.it.53:
6242+ ? engine01.intranet.company.it. (54)
14:30:45.918306 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53:
60263+ A? engine01.intranet.company.it.intranet.company.it. (74)
14:30:45.918329 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53:
18681+ ? engine01.intranet.company.it.intranet.company.it. (74)
14:30:50.920376 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53:
60263+ A? engine01.intranet.company.it.intranet.company.it. (74)
14:30:50.920411 IP kvmhost01.intranet.company.it.54885 > dns.company.it.53:
18681+ ? engine01.intranet.company.it.intranet.company.it. (74)
14:30:56.044242 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53:
28413+ A? engine01.intranet.company.it. (54)
14:30:56.044267 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53:
29680+ ? engine01.intranet.company.it. (54)
14:31:01.049761 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53:
28413+ A? engine01.intranet.company.it. (54)
14:31:01.049777 IP kvmhost01.intranet.company.it.58319 > dns.company.it.53:
29680+ ? engine01.intranet.company.it. (54)
14:31:06.052635 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53:
24807+ A? engine01.intranet.company.it.intranet.company.it. (74)
14:31:06.052649 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53:
53745+ ? engine01.intranet.company.it.intranet.company.it. (74)
14:31:11.057724 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53:
24807+ A? engine01.intranet.company.it.intranet.company.it. (74)
14:31:11.057745 IP kvmhost01.intranet.company.it.58093 > dns.company.it.53:
53745+ ? engine01.intranet.company.it.intranet.company.it. (74)
14:31:16.175204 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53:
63680+ A? engine01.intranet.company.it. (54)
14:31:16.175225 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53:
15726+ ? engine01.intranet.company.it. (54)
14:31:19.670746 IP kvmhost01.intranet.company.it.54689 > dns.company.it.53:
40999+ A? kvmsvilca01.intranet.company.it. (49)
14:31:21.180295 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53:
63680+ A? engine01.intranet.company.it. (54)
14:31:21.180337 IP kvmhost01.intranet.company.it.44950 > dns.company.it.53:
15726+ ? engine01.intranet.company.it. (54)
14:31:23.771959 IP kvmhost01.intranet.company.it.53741 > dns.company.it.53:
1707+ A? internalmx.intranet.company.it. (48)
[...]

The last dns requests has success and gets the MX address and immediately
after i get the email reporting the status change.

This is clearly an issue with name resolution, but that's not clear to me
from the broker.log file. The only message about it that i get is:

Thread-16::DEBUG::2017-12-13
14:31:23,657::monitor::126::ovirt_hosted_engine_ha.broker.monitor.Monitor::(get_value)
Submonitor engine-health id 139653
412040592 current value: {"reason": "failed liveliness check", "health":
"bad", "vm": "up", "detail": "up"}
Thread-16::DEBUG::2017-12-13
14:31:23,657::listener::170::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Response: success {"reaso
n": "failed liveliness check", "health": "bad", "vm": "up", "detail": "up"}


But around that messages i get no signals of error on dns queries or
similar. Do i need to check on other log files?

Luca


On Mon, Dec 11, 2017 at 3:34 PM, Luca 'remix_tj' Lorenzetto <
lorenzetto.l...@gmail.com> wrote:

> Hi Martin, Hi all,
>
> *some minutes* has passed and i've the piece of log i'm looking at.
>
> ​
>  broker.log-upbadup
> 
> ​
>
>


-- 
"E' assurdo impiegare gli uomini di intelligenza eccellente per fare
calcoli che potrebbero essere affidati a chiunque se si usassero delle
macchine"
Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)

"Internet è la più grande biblioteca del mondo.
Ma il problema è che i libri sono tutti sparsi sul pavimento"
John Allen Paulos, Matematico (1945-vivente)

Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <
lorenzetto.l...@gmail.com>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

2017-12-11 Thread Luca 'remix_tj' Lorenzetto
Hi Martin, Hi all,

*some minutes* has passed and i've the piece of log i'm looking at.

​
 broker.log-upbadup

​

I got this morning a notice about enginebadhealth and engineup flip. I'm
not able to identify nothing that could have caused this, because up to
some seconds before the bad health report everything is ok... Do you notice
anything strange?

Thank you,

Luca


2017-12-04 12:00 GMT+01:00 Luca 'remix_tj' Lorenzetto <
lorenzetto.l...@gmail.com>:

> On Mon, Dec 4, 2017 at 9:31 AM, Martin Sivak  wrote:
> > Hi,
> >
> > please attach the log. You can grep out the connected / disconnected
> lines.
> >
> > Look for engine health monitor lines.
> >
> > Martin
>
>
> Log is quite big (about 1.5GB). I'm filtering out the messages around
> the last report of EngineBadHealth <-> EngineUp.
>
> I'll upload in some minutes.
>
> Luca
>
>
> --
> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
> calcoli che potrebbero essere affidati a chiunque se si usassero delle
> macchine"
> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>
> "Internet è la più grande biblioteca del mondo.
> Ma il problema è che i libri sono tutti sparsi sul pavimento"
> John Allen Paulos, Matematico (1945-vivente)
>
> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <
> lorenzetto.l...@gmail.com>
>



-- 
"E' assurdo impiegare gli uomini di intelligenza eccellente per fare
calcoli che potrebbero essere affidati a chiunque se si usassero delle
macchine"
Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)

"Internet è la più grande biblioteca del mondo.
Ma il problema è che i libri sono tutti sparsi sul pavimento"
John Allen Paulos, Matematico (1945-vivente)

Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , <
lorenzetto.l...@gmail.com>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

2017-12-04 Thread Luca 'remix_tj' Lorenzetto
On Mon, Dec 4, 2017 at 9:31 AM, Martin Sivak  wrote:
> Hi,
>
> please attach the log. You can grep out the connected / disconnected lines.
>
> Look for engine health monitor lines.
>
> Martin


Log is quite big (about 1.5GB). I'm filtering out the messages around
the last report of EngineBadHealth <-> EngineUp.

I'll upload in some minutes.

Luca


-- 
"E' assurdo impiegare gli uomini di intelligenza eccellente per fare
calcoli che potrebbero essere affidati a chiunque se si usassero delle
macchine"
Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)

"Internet è la più grande biblioteca del mondo.
Ma il problema è che i libri sono tutti sparsi sul pavimento"
John Allen Paulos, Matematico (1945-vivente)

Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

2017-12-04 Thread Martin Sivak
Hi,

please attach the log. You can grep out the connected / disconnected lines.

Look for engine health monitor lines.

Martin

On Sat, Dec 2, 2017 at 5:10 PM, Luca 'remix_tj' Lorenzetto
 wrote:
> Hello,
>
> i had several switches between EngineUp and EngineBadHealth today with
> broker.log ad DEBUG level. Where i should start to identify root
> cause? Log is somewhat chatty at this level.
>
> Luca
>
> On Fri, Dec 1, 2017 at 1:24 PM, Martin Sivak  wrote:
>> Hi,
>>
>>> [logger_root]
>>> level=INFO
>>
>>> [handler_logfile]
>>> level=DEBUG
>>
>>> Seems already set. The file broker.log is already containing DEBUG,
>>> but syslog is not (and this is good). What about logger_root?
>>
>> Yeah, I think you should change that one as well to get full debug
>> logging. The handler level does nothing if the messages do not get to
>> it. And the root logger should not let them in the default
>> configuration you have.
>>
>> Best regards
>>
>> Martin
>>
>>> Luca
>>>
>>> On Fri, Dec 1, 2017 at 12:29 PM, Martin Sivak  wrote:
 Hi,

 can you please enable DEBUG log and then attach broker.log once it
 reproduces? See /etc/ovirt-hosted-engine-ha/broker-log.conf for the
 place where to set it (do not forget to restart ovirt-ha-agent and
 ovirt-ha-broker afterwards).

 Name resolution issues might be the cause for this indeed, because the
 broker is trying to query a health endpoint over HTTP. If
 notifications failed because of unresolvable name then there is high
 chance the same happens to the health request every now and then.

 Best regards

 Martin Sivak

 On Fri, Dec 1, 2017 at 10:50 AM, Luca 'remix_tj' Lorenzetto
  wrote:
> Hi all,
>
> since some days my hosted-engine environments (one RHEV 4.0.7, one
> ovirt 4.1.7) continue to send mails about changes between EngineUp and
> EngineBadHealth.
>
> This is pretty annoying and i'm not able to find out the root cause.
>
> The only issue i've seen on hosts is this error appearing sometimes
> randomly about sending mails.
>
> Thread-1::ERROR::2017-12-01
> 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email)
> [Errno -2] Name or service not known
> Traceback (most recent call last):
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py",
> line 26, in send_email
> timeout=float(cfg["smtp-timeout"]))
>   File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
> (code, msg) = self.connect(host, port)
>   File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
> self.sock = self._get_socket(host, port, self.timeout)
>   File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
> return socket.create_connection((host, port), timeout)
>   File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
> for res in getaddrinfo(host, port, 0, SOCK_STREAM):
> gaierror: [Errno -2] Name or service not known
> Thread-6::WARNING::2017-12-01
> 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(action)
> bad health status: Hosted Engine is not up!
>
> There are no errors on engine logs and all the api queries done by
> ovirt-hosted-engine-ha returns HTTP code 200.
>
> I suspect the switch between EngineUP and EngineBadHealth status could
> be due to some dns resolution issues, but there is no clear message on
> the log showing this and this doesn't help our netadmins to make some
> traces.
>
> Is there a way to increase the verbosity of broker.log and agent.log?
>
> Luca
>
> --
> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
> calcoli che potrebbero essere affidati a chiunque se si usassero delle
> macchine"
> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>
> "Internet è la più grande biblioteca del mondo.
> Ma il problema è che i libri sono tutti sparsi sul pavimento"
> John Allen Paulos, Matematico (1945-vivente)
>
> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>>
>>> --
>>> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
>>> calcoli che potrebbero essere affidati a chiunque se si usassero delle
>>> macchine"
>>> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>>>
>>> "Internet è la più grande biblioteca del mondo.
>>> Ma il problema è che i libri sono tutti sparsi sul pavimento"
>>> John Allen Paulos, Matematico (1945-vivente)
>>>
>>> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , 
>>> 
>
>
>
> --
> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
> calcoli che potrebbero e

Re: [ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

2017-12-02 Thread Luca 'remix_tj' Lorenzetto
Hello,

i had several switches between EngineUp and EngineBadHealth today with
broker.log ad DEBUG level. Where i should start to identify root
cause? Log is somewhat chatty at this level.

Luca

On Fri, Dec 1, 2017 at 1:24 PM, Martin Sivak  wrote:
> Hi,
>
>> [logger_root]
>> level=INFO
>
>> [handler_logfile]
>> level=DEBUG
>
>> Seems already set. The file broker.log is already containing DEBUG,
>> but syslog is not (and this is good). What about logger_root?
>
> Yeah, I think you should change that one as well to get full debug
> logging. The handler level does nothing if the messages do not get to
> it. And the root logger should not let them in the default
> configuration you have.
>
> Best regards
>
> Martin
>
>> Luca
>>
>> On Fri, Dec 1, 2017 at 12:29 PM, Martin Sivak  wrote:
>>> Hi,
>>>
>>> can you please enable DEBUG log and then attach broker.log once it
>>> reproduces? See /etc/ovirt-hosted-engine-ha/broker-log.conf for the
>>> place where to set it (do not forget to restart ovirt-ha-agent and
>>> ovirt-ha-broker afterwards).
>>>
>>> Name resolution issues might be the cause for this indeed, because the
>>> broker is trying to query a health endpoint over HTTP. If
>>> notifications failed because of unresolvable name then there is high
>>> chance the same happens to the health request every now and then.
>>>
>>> Best regards
>>>
>>> Martin Sivak
>>>
>>> On Fri, Dec 1, 2017 at 10:50 AM, Luca 'remix_tj' Lorenzetto
>>>  wrote:
 Hi all,

 since some days my hosted-engine environments (one RHEV 4.0.7, one
 ovirt 4.1.7) continue to send mails about changes between EngineUp and
 EngineBadHealth.

 This is pretty annoying and i'm not able to find out the root cause.

 The only issue i've seen on hosts is this error appearing sometimes
 randomly about sending mails.

 Thread-1::ERROR::2017-12-01
 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email)
 [Errno -2] Name or service not known
 Traceback (most recent call last):
   File 
 "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py",
 line 26, in send_email
 timeout=float(cfg["smtp-timeout"]))
   File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
 (code, msg) = self.connect(host, port)
   File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
 self.sock = self._get_socket(host, port, self.timeout)
   File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
 return socket.create_connection((host, port), timeout)
   File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
 for res in getaddrinfo(host, port, 0, SOCK_STREAM):
 gaierror: [Errno -2] Name or service not known
 Thread-6::WARNING::2017-12-01
 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(action)
 bad health status: Hosted Engine is not up!

 There are no errors on engine logs and all the api queries done by
 ovirt-hosted-engine-ha returns HTTP code 200.

 I suspect the switch between EngineUP and EngineBadHealth status could
 be due to some dns resolution issues, but there is no clear message on
 the log showing this and this doesn't help our netadmins to make some
 traces.

 Is there a way to increase the verbosity of broker.log and agent.log?

 Luca

 --
 "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
 calcoli che potrebbero essere affidati a chiunque se si usassero delle
 macchine"
 Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)

 "Internet è la più grande biblioteca del mondo.
 Ma il problema è che i libri sono tutti sparsi sul pavimento"
 John Allen Paulos, Matematico (1945-vivente)

 Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , 
 
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>> --
>> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
>> calcoli che potrebbero essere affidati a chiunque se si usassero delle
>> macchine"
>> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>>
>> "Internet è la più grande biblioteca del mondo.
>> Ma il problema è che i libri sono tutti sparsi sul pavimento"
>> John Allen Paulos, Matematico (1945-vivente)
>>
>> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , 
>> 



-- 
"E' assurdo impiegare gli uomini di intelligenza eccellente per fare
calcoli che potrebbero essere affidati a chiunque se si usassero delle
macchine"
Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)

"Internet è la più grande biblioteca del mondo.
Ma il problema è che i libri sono tutti sparsi sul pavimento"
John Allen Paulos, Matematico (1945-vivente)

Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , 
_

Re: [ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

2017-12-01 Thread Martin Sivak
Hi,

> [logger_root]
> level=INFO

> [handler_logfile]
> level=DEBUG

> Seems already set. The file broker.log is already containing DEBUG,
> but syslog is not (and this is good). What about logger_root?

Yeah, I think you should change that one as well to get full debug
logging. The handler level does nothing if the messages do not get to
it. And the root logger should not let them in the default
configuration you have.

Best regards

Martin

> Luca
>
> On Fri, Dec 1, 2017 at 12:29 PM, Martin Sivak  wrote:
>> Hi,
>>
>> can you please enable DEBUG log and then attach broker.log once it
>> reproduces? See /etc/ovirt-hosted-engine-ha/broker-log.conf for the
>> place where to set it (do not forget to restart ovirt-ha-agent and
>> ovirt-ha-broker afterwards).
>>
>> Name resolution issues might be the cause for this indeed, because the
>> broker is trying to query a health endpoint over HTTP. If
>> notifications failed because of unresolvable name then there is high
>> chance the same happens to the health request every now and then.
>>
>> Best regards
>>
>> Martin Sivak
>>
>> On Fri, Dec 1, 2017 at 10:50 AM, Luca 'remix_tj' Lorenzetto
>>  wrote:
>>> Hi all,
>>>
>>> since some days my hosted-engine environments (one RHEV 4.0.7, one
>>> ovirt 4.1.7) continue to send mails about changes between EngineUp and
>>> EngineBadHealth.
>>>
>>> This is pretty annoying and i'm not able to find out the root cause.
>>>
>>> The only issue i've seen on hosts is this error appearing sometimes
>>> randomly about sending mails.
>>>
>>> Thread-1::ERROR::2017-12-01
>>> 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email)
>>> [Errno -2] Name or service not known
>>> Traceback (most recent call last):
>>>   File 
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py",
>>> line 26, in send_email
>>> timeout=float(cfg["smtp-timeout"]))
>>>   File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
>>> (code, msg) = self.connect(host, port)
>>>   File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
>>> self.sock = self._get_socket(host, port, self.timeout)
>>>   File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
>>> return socket.create_connection((host, port), timeout)
>>>   File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
>>> for res in getaddrinfo(host, port, 0, SOCK_STREAM):
>>> gaierror: [Errno -2] Name or service not known
>>> Thread-6::WARNING::2017-12-01
>>> 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(action)
>>> bad health status: Hosted Engine is not up!
>>>
>>> There are no errors on engine logs and all the api queries done by
>>> ovirt-hosted-engine-ha returns HTTP code 200.
>>>
>>> I suspect the switch between EngineUP and EngineBadHealth status could
>>> be due to some dns resolution issues, but there is no clear message on
>>> the log showing this and this doesn't help our netadmins to make some
>>> traces.
>>>
>>> Is there a way to increase the verbosity of broker.log and agent.log?
>>>
>>> Luca
>>>
>>> --
>>> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
>>> calcoli che potrebbero essere affidati a chiunque se si usassero delle
>>> macchine"
>>> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>>>
>>> "Internet è la più grande biblioteca del mondo.
>>> Ma il problema è che i libri sono tutti sparsi sul pavimento"
>>> John Allen Paulos, Matematico (1945-vivente)
>>>
>>> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , 
>>> 
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
> --
> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
> calcoli che potrebbero essere affidati a chiunque se si usassero delle
> macchine"
> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>
> "Internet è la più grande biblioteca del mondo.
> Ma il problema è che i libri sono tutti sparsi sul pavimento"
> John Allen Paulos, Matematico (1945-vivente)
>
> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , 
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

2017-12-01 Thread Luca 'remix_tj' Lorenzetto
Hi Martin,

i see this in broker-log.conf
[logger_root]
level=INFO
handlers=syslog,logfile
propagate=0

[handler_syslog]
level=ERROR
class=handlers.SysLogHandler
formatter=sysform
args=('/dev/log', handlers.SysLogHandler.LOG_USER)

[handler_logfile]
class=logging.handlers.TimedRotatingFileHandler
args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, 7)
level=DEBUG
formatter=long

Seems already set. The file broker.log is already containing DEBUG,
but syslog is not (and this is good). What about logger_root?

Luca

On Fri, Dec 1, 2017 at 12:29 PM, Martin Sivak  wrote:
> Hi,
>
> can you please enable DEBUG log and then attach broker.log once it
> reproduces? See /etc/ovirt-hosted-engine-ha/broker-log.conf for the
> place where to set it (do not forget to restart ovirt-ha-agent and
> ovirt-ha-broker afterwards).
>
> Name resolution issues might be the cause for this indeed, because the
> broker is trying to query a health endpoint over HTTP. If
> notifications failed because of unresolvable name then there is high
> chance the same happens to the health request every now and then.
>
> Best regards
>
> Martin Sivak
>
> On Fri, Dec 1, 2017 at 10:50 AM, Luca 'remix_tj' Lorenzetto
>  wrote:
>> Hi all,
>>
>> since some days my hosted-engine environments (one RHEV 4.0.7, one
>> ovirt 4.1.7) continue to send mails about changes between EngineUp and
>> EngineBadHealth.
>>
>> This is pretty annoying and i'm not able to find out the root cause.
>>
>> The only issue i've seen on hosts is this error appearing sometimes
>> randomly about sending mails.
>>
>> Thread-1::ERROR::2017-12-01
>> 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email)
>> [Errno -2] Name or service not known
>> Traceback (most recent call last):
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py",
>> line 26, in send_email
>> timeout=float(cfg["smtp-timeout"]))
>>   File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
>> (code, msg) = self.connect(host, port)
>>   File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
>> self.sock = self._get_socket(host, port, self.timeout)
>>   File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
>> return socket.create_connection((host, port), timeout)
>>   File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
>> for res in getaddrinfo(host, port, 0, SOCK_STREAM):
>> gaierror: [Errno -2] Name or service not known
>> Thread-6::WARNING::2017-12-01
>> 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(action)
>> bad health status: Hosted Engine is not up!
>>
>> There are no errors on engine logs and all the api queries done by
>> ovirt-hosted-engine-ha returns HTTP code 200.
>>
>> I suspect the switch between EngineUP and EngineBadHealth status could
>> be due to some dns resolution issues, but there is no clear message on
>> the log showing this and this doesn't help our netadmins to make some
>> traces.
>>
>> Is there a way to increase the verbosity of broker.log and agent.log?
>>
>> Luca
>>
>> --
>> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
>> calcoli che potrebbero essere affidati a chiunque se si usassero delle
>> macchine"
>> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>>
>> "Internet è la più grande biblioteca del mondo.
>> Ma il problema è che i libri sono tutti sparsi sul pavimento"
>> John Allen Paulos, Matematico (1945-vivente)
>>
>> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , 
>> 
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users



-- 
"E' assurdo impiegare gli uomini di intelligenza eccellente per fare
calcoli che potrebbero essere affidati a chiunque se si usassero delle
macchine"
Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)

"Internet è la più grande biblioteca del mondo.
Ma il problema è che i libri sono tutti sparsi sul pavimento"
John Allen Paulos, Matematico (1945-vivente)

Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

2017-12-01 Thread Martin Sivak
Hi,

can you please enable DEBUG log and then attach broker.log once it
reproduces? See /etc/ovirt-hosted-engine-ha/broker-log.conf for the
place where to set it (do not forget to restart ovirt-ha-agent and
ovirt-ha-broker afterwards).

Name resolution issues might be the cause for this indeed, because the
broker is trying to query a health endpoint over HTTP. If
notifications failed because of unresolvable name then there is high
chance the same happens to the health request every now and then.

Best regards

Martin Sivak

On Fri, Dec 1, 2017 at 10:50 AM, Luca 'remix_tj' Lorenzetto
 wrote:
> Hi all,
>
> since some days my hosted-engine environments (one RHEV 4.0.7, one
> ovirt 4.1.7) continue to send mails about changes between EngineUp and
> EngineBadHealth.
>
> This is pretty annoying and i'm not able to find out the root cause.
>
> The only issue i've seen on hosts is this error appearing sometimes
> randomly about sending mails.
>
> Thread-1::ERROR::2017-12-01
> 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email)
> [Errno -2] Name or service not known
> Traceback (most recent call last):
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py",
> line 26, in send_email
> timeout=float(cfg["smtp-timeout"]))
>   File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
> (code, msg) = self.connect(host, port)
>   File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
> self.sock = self._get_socket(host, port, self.timeout)
>   File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
> return socket.create_connection((host, port), timeout)
>   File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
> for res in getaddrinfo(host, port, 0, SOCK_STREAM):
> gaierror: [Errno -2] Name or service not known
> Thread-6::WARNING::2017-12-01
> 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(action)
> bad health status: Hosted Engine is not up!
>
> There are no errors on engine logs and all the api queries done by
> ovirt-hosted-engine-ha returns HTTP code 200.
>
> I suspect the switch between EngineUP and EngineBadHealth status could
> be due to some dns resolution issues, but there is no clear message on
> the log showing this and this doesn't help our netadmins to make some
> traces.
>
> Is there a way to increase the verbosity of broker.log and agent.log?
>
> Luca
>
> --
> "E' assurdo impiegare gli uomini di intelligenza eccellente per fare
> calcoli che potrebbero essere affidati a chiunque se si usassero delle
> macchine"
> Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)
>
> "Internet è la più grande biblioteca del mondo.
> Ma il problema è che i libri sono tutti sparsi sul pavimento"
> John Allen Paulos, Matematico (1945-vivente)
>
> Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

2017-12-01 Thread Johan Bernhardsson
We have had a similar issue that has been resolved with restarting the
engine vps. 

Not ideal but it solves the problem for a about a month.

/JohanOn Fri, 2017-12-01 at 10:50 +0100, Luca 'remix_tj' Lorenzetto wrote:
> Hi all,
> 
> since some days my hosted-engine environments (one RHEV 4.0.7, one
> ovirt 4.1.7) continue to send mails about changes between EngineUp
> and
> EngineBadHealth.
> 
> This is pretty annoying and i'm not able to find out the root cause.
> 
> The only issue i've seen on hosts is this error appearing sometimes
> randomly about sending mails.
> 
> Thread-1::ERROR::2017-12-01
> 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifi
> cations.Notifications::(send_email)
> [Errno -2] Name or service not known
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-
> packages/ovirt_hosted_engine_ha/broker/notifications.py",
> line 26, in send_email
> timeout=float(cfg["smtp-timeout"]))
>   File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
> (code, msg) = self.connect(host, port)
>   File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
> self.sock = self._get_socket(host, port, self.timeout)
>   File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
> return socket.create_connection((host, port), timeout)
>   File "/usr/lib64/python2.7/socket.py", line 553, in
> create_connection
> for res in getaddrinfo(host, port, 0, SOCK_STREAM):
> gaierror: [Errno -2] Name or service not known
> Thread-6::WARNING::2017-12-01
> 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(act
> ion)
> bad health status: Hosted Engine is not up!
> 
> There are no errors on engine logs and all the api queries done by
> ovirt-hosted-engine-ha returns HTTP code 200.
> 
> I suspect the switch between EngineUP and EngineBadHealth status
> could
> be due to some dns resolution issues, but there is no clear message
> on
> the log showing this and this doesn't help our netadmins to make some
> traces.
> 
> Is there a way to increase the verbosity of broker.log and agent.log?
> 
> Luca
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

2017-12-01 Thread Luca 'remix_tj' Lorenzetto
Hi all,

since some days my hosted-engine environments (one RHEV 4.0.7, one
ovirt 4.1.7) continue to send mails about changes between EngineUp and
EngineBadHealth.

This is pretty annoying and i'm not able to find out the root cause.

The only issue i've seen on hosts is this error appearing sometimes
randomly about sending mails.

Thread-1::ERROR::2017-12-01
03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email)
[Errno -2] Name or service not known
Traceback (most recent call last):
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py",
line 26, in send_email
timeout=float(cfg["smtp-timeout"]))
  File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
(code, msg) = self.connect(host, port)
  File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
self.sock = self._get_socket(host, port, self.timeout)
  File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
return socket.create_connection((host, port), timeout)
  File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno -2] Name or service not known
Thread-6::WARNING::2017-12-01
03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(action)
bad health status: Hosted Engine is not up!

There are no errors on engine logs and all the api queries done by
ovirt-hosted-engine-ha returns HTTP code 200.

I suspect the switch between EngineUP and EngineBadHealth status could
be due to some dns resolution issues, but there is no clear message on
the log showing this and this doesn't help our netadmins to make some
traces.

Is there a way to increase the verbosity of broker.log and agent.log?

Luca

-- 
"E' assurdo impiegare gli uomini di intelligenza eccellente per fare
calcoli che potrebbero essere affidati a chiunque se si usassero delle
macchine"
Gottfried Wilhelm von Leibnitz, Filosofo e Matematico (1646-1716)

"Internet è la più grande biblioteca del mondo.
Ma il problema è che i libri sono tutti sparsi sul pavimento"
John Allen Paulos, Matematico (1945-vivente)

Luca 'remix_tj' Lorenzetto, http://www.remixtj.net , 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users