[ovirt-users] Re: Unassigned hosts

2020-08-07 Thread Nardus Geldenhuys
Hi Artur

Hope you are well, please see below, this after I restarted the engine:

host:
[root@ovirt-aa-1-21:~]↥ # tcpdump -i ovirtmgmt -c 1000 -nnvvS dst
ovirt-engine-aa-1-01
tcpdump: listening on ovirtmgmt, link-type EN10MB (Ethernet), capture size
262144 bytes
2020-08-07 12:09:32.553543 ARP, Ethernet (len 6), IPv4 (len 4), Reply
172.140.220.111 is-at 00:25:b5:04:00:25, length 28
2020-08-07 12:10:05.584594 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
proto TCP (6), length 60)
172.140.220.111.54321 > 172.140.220.23.56202: Flags [S.], cksum 0x5cd5
(incorrect -> 0xc8ca), seq 4036072905, ack 3265413231, win 28960, options
[mss 1460,sackOK,TS val 3039504636 ecr 341411251,nop,wscale 7], length 0
2020-08-07 12:10:10.589276 ARP, Ethernet (len 6), IPv4 (len 4), Reply
172.140.220.111 is-at 00:25:b5:04:00:25, length 28
2020-08-07 12:10:15.596230 IP (tos 0x0, ttl 64, id 48438, offset 0, flags
[DF], proto TCP (6), length 52)
172.140.220.111.54321 > 172.140.220.23.56202: Flags [F.], cksum 0x5ccd
(incorrect -> 0x40b8), seq 4036072906, ack 3265413231, win 227, options
[nop,nop,TS val 3039514647 ecr 341411251], length 0
2020-08-07 12:10:20.596429 ARP, Ethernet (len 6), IPv4 (len 4), Request
who-has 172.140.220.23 tell 172.140.220.111, length 28
2020-08-07 12:10:20.663699 IP (tos 0x0, ttl 64, id 64726, offset 0, flags
[DF], proto TCP (6), length 40)
172.140.220.111.54321 > 172.140.220.23.56202: Flags [R], cksum 0x1d20
(correct), seq 4036072907, win 0, length 0

engine
[root@ovirt-engine-aa-1-01 ~]# tcpdump -i eth0 -c 1000 -nnvvS src
ovirt-aa-1-21
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size
262144 bytes
2020-08-07 12:09:31.891242 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
proto TCP (6), length 60)
172.140.220.111.54321 > 172.140.220.23.56202: Flags [S.], cksum 0xc8ca
(correct), seq 4036072905, ack 3265413231, win 28960, options [mss
1460,sackOK,TS val 3039504636 ecr 341411251,nop,wscale 7], length 0
2020-08-07 12:09:36.895502 ARP, Ethernet (len 6), IPv4 (len 4), Reply
172.140.220.111 is-at 00:25:b5:04:00:25, length 42
2020-08-07 12:09:41.901981 IP (tos 0x0, ttl 64, id 48438, offset 0, flags
[DF], proto TCP (6), length 52)
172.140.220.111.54321 > 172.140.220.23.56202: Flags [F.], cksum 0x40b8
(correct), seq 4036072906, ack 3265413231, win 227, options [nop,nop,TS val
3039514647 ecr 341411251], length 0
2020-08-07 12:09:46.901681 ARP, Ethernet (len 6), IPv4 (len 4), Request
who-has 172.140.220.23 tell 172.140.220.111, length 42
2020-08-07 12:09:46.968911 IP (tos 0x0, ttl 64, id 64726, offset 0, flags
[DF], proto TCP (6), length 40)
172.140.220.111.54321 > 172.140.220.23.56202: Flags [R], cksum 0x1d20
(correct), seq 4036072907, win 0, length 0

Regards

Nar

On Fri, 7 Aug 2020 at 11:54, Artur Socha  wrote:

> Hi Nardus,
> There is one more thing to be checked.
>
> 1) could you check if there are any packets sent from the affected host to
> the engine?
> on host:
> # outgoing traffic
>  sudo  tcpdump -i  -c 1000 -nnvvS dst
> 
>
> 2) same the other way round. Check if there are packets received on engine
> side from affected host
> on engine:
> # incoming traffic
> sudo  tcpdump -i  -c 1000 -nnvvS src
> 
>
> Artur
>
>
> On Thu, Aug 6, 2020 at 4:51 PM Artur Socha  wrote:
>
>> Thanks Nardus,
>> After a quick look I found what I was suspecting - there are way too many
>> threads in Blocked state. I don't know yet the reason but this is very
>> helpful. I'll let you know about the findings/investigation. Meanwhile, you
>> may try restarting the engine as (a very brute and ugly) workaround).
>> You may try to setup slightly bigger thread pool - may save you some time
>> until the next hiccup. However, please be aware that this may come with the
>> cost in memory usage and higher cpu usage (due to increased context
>> switching)
>> Here are some docs:
>>
>> # Specify the thread pool size for jboss managed scheduled executor service 
>> used by commands to periodically execute
>> # methods. It is generally not necessary to increase the number of threads 
>> in this thread pool. To change the value
>> # permanently create a conf file 99-engine-scheduled-thread-pool.conf in 
>> /etc/ovirt-engine/engine.conf.d/
>> ENGINE_SCHEDULED_THREAD_POOL_SIZE=100
>>
>>
>> A.
>>
>>
>> On Thu, Aug 6, 2020 at 4:19 PM Nardus Geldenhuys 
>> wrote:
>>
>>> Hi Artur
>>>
>>> Please find attached, also let me know if I need to rerun. They 5 min
>>> apart
>>>
>>> [root@engine-aa-1-01 ovirt-engine]#  ps -ef | grep jboss | grep -v grep
>>> | awk '{ print $2 }'
>>> 27390
>>> [root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 >
>>> your_engine_thread_dump_1.txt
>>> [root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 >
>>> your_engine_thread_dump_2.txt
>>> [root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 >
>>> your_engine_thread_dump_3.txt
>>>
>>> Regards
>>>
>>> Nar
>>>
>>> On Thu, 6 Aug 2020 at 15:55, Artur Socha  wrote:
>>>
 Sure thing.
 On engine host please find  jboss 

[ovirt-users] Re: Unassigned hosts

2020-08-07 Thread Artur Socha
Hi Nardus,
There is one more thing to be checked.

1) could you check if there are any packets sent from the affected host to
the engine?
on host:
# outgoing traffic
 sudo  tcpdump -i  -c 1000 -nnvvS dst


2) same the other way round. Check if there are packets received on engine
side from affected host
on engine:
# incoming traffic
sudo  tcpdump -i  -c 1000 -nnvvS src


Artur


On Thu, Aug 6, 2020 at 4:51 PM Artur Socha  wrote:

> Thanks Nardus,
> After a quick look I found what I was suspecting - there are way too many
> threads in Blocked state. I don't know yet the reason but this is very
> helpful. I'll let you know about the findings/investigation. Meanwhile, you
> may try restarting the engine as (a very brute and ugly) workaround).
> You may try to setup slightly bigger thread pool - may save you some time
> until the next hiccup. However, please be aware that this may come with the
> cost in memory usage and higher cpu usage (due to increased context
> switching)
> Here are some docs:
>
> # Specify the thread pool size for jboss managed scheduled executor service 
> used by commands to periodically execute
> # methods. It is generally not necessary to increase the number of threads in 
> this thread pool. To change the value
> # permanently create a conf file 99-engine-scheduled-thread-pool.conf in 
> /etc/ovirt-engine/engine.conf.d/
> ENGINE_SCHEDULED_THREAD_POOL_SIZE=100
>
>
> A.
>
>
> On Thu, Aug 6, 2020 at 4:19 PM Nardus Geldenhuys 
> wrote:
>
>> Hi Artur
>>
>> Please find attached, also let me know if I need to rerun. They 5 min
>> apart
>>
>> [root@engine-aa-1-01 ovirt-engine]#  ps -ef | grep jboss | grep -v grep
>> | awk '{ print $2 }'
>> 27390
>> [root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 >
>> your_engine_thread_dump_1.txt
>> [root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 >
>> your_engine_thread_dump_2.txt
>> [root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 >
>> your_engine_thread_dump_3.txt
>>
>> Regards
>>
>> Nar
>>
>> On Thu, 6 Aug 2020 at 15:55, Artur Socha  wrote:
>>
>>> Sure thing.
>>> On engine host please find  jboss pid. You can use this command:
>>>
>>>  ps -ef | grep jboss | grep -v grep | awk '{ print $2 }'
>>>
>>> or jps tool from jdk. Sample output on my dev environment is:
>>>
>>> ± % jps
>>>!2860
>>> 64853 jboss-modules.jar
>>> 196217 Jps
>>>
>>> Then use jstack from jdk:
>>> jstack   > your_engine_thread_dump.txt
>>> 2 or 3 dumps taken in approximately 5 minutes intervals would be even
>>> more useful.
>>>
>>> Here you can find even more options
>>> https://www.baeldung.com/java-thread-dump
>>>
>>> Artur
>>>
>>> On Thu, Aug 6, 2020 at 3:15 PM Nardus Geldenhuys 
>>> wrote:
>>>
 Hi

 Can create thread dump, please send details on howto.

 Regards

 Nardus

 On Thu, 6 Aug 2020 at 14:17, Artur Socha  wrote:

> Hi Nardus,
> You might have hit an issue I have been hunting for some time ( [1]
> and  [2] ).
> [1] could not be properly resolved because at a time was not able to
> recreate an issue on dev setup.
> I suspect [2] is related.
>
> Would you be able to prepare a thread dump from your engine instance?
> Additionally, please check for potential libvirt errors/warnings.
> Can you also paste the output of:
> sudo yum list installed | grep vdsm
> sudo yum list installed | grep ovirt-engine
> sudo yum list installed | grep libvirt
>
> Usually, according to previous reports, restarting the engine helps to
> restore connectivity with hosts ... at least for some time.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1845152
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1846338
>
> regards,
> Artur
>
>
>
> On Thu, Aug 6, 2020 at 8:01 AM Nardus Geldenhuys 
> wrote:
>
>> Also see this in engine:
>>
>> Aug 6, 2020, 7:37:17 AM
>> VDSM someserver command Get Host Capabilities failed: Message timeout
>> which can be caused by communication issues
>>
>> On Thu, 6 Aug 2020 at 07:09, Strahil Nikolov 
>> wrote:
>>
>>> Can you fheck for errors on the affected host. Most probably you
>>> need the vdsm logs.
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>>
>>> На 6 август 2020 г. 7:40:23 GMT+03:00, Nardus Geldenhuys <
>>> nard...@gmail.com> написа:
>>> >Hi Strahil
>>> >
>>> >Hope you are well. I get the following error when I tried to confirm
>>> >reboot:
>>> >
>>> >Error while executing action: Cannot confirm 'Host has been
>>> rebooted'
>>> >Host.
>>> >Valid Host statuses are "Non operational", "Maintenance" or
>>> >"Connecting".
>>> >
>>> >And I can't put it in maintenance, only option is "restart" or
>>> "stop".
>>> >
>>> >Regards
>>> >
>>> >Nar
>>> >
>>> >On Thu, 6 Aug 2020 at 06:16, Strahil Nikolov >> >

[ovirt-users] Re: Unassigned hosts

2020-08-06 Thread Martin Perina
Hi Nardus,

I'm assuming that your setup was stable and you were able to run your VMs
without problems. If so, then below is not a solution to your problem, you
should really check engine and VDSM logs for reasons why your hosts become
NonResponsive. Most probably there is underlying storage or network issue
which prevents correct engine <-> hosts communications and which made your
hosts NonResponsive. The solution below will just hide the issues you
currently have.

If your problem started suddenly when you significantly increased the
number of running VMs or decreased the number of available hosts, then you
are suffering from those issues because of not having enough resources.

Regards,
Martin

On Thu, Aug 6, 2020 at 4:51 PM Artur Socha  wrote:

> Thanks Nardus,
> After a quick look I found what I was suspecting - there are way too many
> threads in Blocked state. I don't know yet the reason but this is very
> helpful. I'll let you know about the findings/investigation. Meanwhile, you
> may try restarting the engine as (a very brute and ugly) workaround).
> You may try to setup slightly bigger thread pool - may save you some time
> until the next hiccup. However, please be aware that this may come with the
> cost in memory usage and higher cpu usage (due to increased context
> switching)
> Here are some docs:
>
> # Specify the thread pool size for jboss managed scheduled executor service 
> used by commands to periodically execute
> # methods. It is generally not necessary to increase the number of threads in 
> this thread pool. To change the value
> # permanently create a conf file 99-engine-scheduled-thread-pool.conf in 
> /etc/ovirt-engine/engine.conf.d/
> ENGINE_SCHEDULED_THREAD_POOL_SIZE=100
>
>
> A.
>
>
> On Thu, Aug 6, 2020 at 4:19 PM Nardus Geldenhuys 
> wrote:
>
>> Hi Artur
>>
>> Please find attached, also let me know if I need to rerun. They 5 min
>> apart
>>
>> [root@engine-aa-1-01 ovirt-engine]#  ps -ef | grep jboss | grep -v grep
>> | awk '{ print $2 }'
>> 27390
>> [root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 >
>> your_engine_thread_dump_1.txt
>> [root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 >
>> your_engine_thread_dump_2.txt
>> [root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 >
>> your_engine_thread_dump_3.txt
>>
>> Regards
>>
>> Nar
>>
>> On Thu, 6 Aug 2020 at 15:55, Artur Socha  wrote:
>>
>>> Sure thing.
>>> On engine host please find  jboss pid. You can use this command:
>>>
>>>  ps -ef | grep jboss | grep -v grep | awk '{ print $2 }'
>>>
>>> or jps tool from jdk. Sample output on my dev environment is:
>>>
>>> ± % jps
>>>!2860
>>> 64853 jboss-modules.jar
>>> 196217 Jps
>>>
>>> Then use jstack from jdk:
>>> jstack   > your_engine_thread_dump.txt
>>> 2 or 3 dumps taken in approximately 5 minutes intervals would be even
>>> more useful.
>>>
>>> Here you can find even more options
>>> https://www.baeldung.com/java-thread-dump
>>>
>>> Artur
>>>
>>> On Thu, Aug 6, 2020 at 3:15 PM Nardus Geldenhuys 
>>> wrote:
>>>
 Hi

 Can create thread dump, please send details on howto.

 Regards

 Nardus

 On Thu, 6 Aug 2020 at 14:17, Artur Socha  wrote:

> Hi Nardus,
> You might have hit an issue I have been hunting for some time ( [1]
> and  [2] ).
> [1] could not be properly resolved because at a time was not able to
> recreate an issue on dev setup.
> I suspect [2] is related.
>
> Would you be able to prepare a thread dump from your engine instance?
> Additionally, please check for potential libvirt errors/warnings.
> Can you also paste the output of:
> sudo yum list installed | grep vdsm
> sudo yum list installed | grep ovirt-engine
> sudo yum list installed | grep libvirt
>
> Usually, according to previous reports, restarting the engine helps to
> restore connectivity with hosts ... at least for some time.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1845152
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1846338
>
> regards,
> Artur
>
>
>
> On Thu, Aug 6, 2020 at 8:01 AM Nardus Geldenhuys 
> wrote:
>
>> Also see this in engine:
>>
>> Aug 6, 2020, 7:37:17 AM
>> VDSM someserver command Get Host Capabilities failed: Message timeout
>> which can be caused by communication issues
>>
>> On Thu, 6 Aug 2020 at 07:09, Strahil Nikolov 
>> wrote:
>>
>>> Can you fheck for errors on the affected host. Most probably you
>>> need the vdsm logs.
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>>
>>> На 6 август 2020 г. 7:40:23 GMT+03:00, Nardus Geldenhuys <
>>> nard...@gmail.com> написа:
>>> >Hi Strahil
>>> >
>>> >Hope you are well. I get the following error when I tried to confirm
>>> >reboot:
>>> >
>>> >Error while executing action: Cannot confirm 'Host has been
>>> rebooted'
>>> >Host.

[ovirt-users] Re: Unassigned hosts

2020-08-06 Thread Artur Socha
Thanks Nardus,
After a quick look I found what I was suspecting - there are way too many
threads in Blocked state. I don't know yet the reason but this is very
helpful. I'll let you know about the findings/investigation. Meanwhile, you
may try restarting the engine as (a very brute and ugly) workaround).
You may try to setup slightly bigger thread pool - may save you some time
until the next hiccup. However, please be aware that this may come with the
cost in memory usage and higher cpu usage (due to increased context
switching)
Here are some docs:

# Specify the thread pool size for jboss managed scheduled executor
service used by commands to periodically execute
# methods. It is generally not necessary to increase the number of
threads in this thread pool. To change the value
# permanently create a conf file 99-engine-scheduled-thread-pool.conf
in /etc/ovirt-engine/engine.conf.d/
ENGINE_SCHEDULED_THREAD_POOL_SIZE=100


A.


On Thu, Aug 6, 2020 at 4:19 PM Nardus Geldenhuys  wrote:

> Hi Artur
>
> Please find attached, also let me know if I need to rerun. They 5 min apart
>
> [root@engine-aa-1-01 ovirt-engine]#  ps -ef | grep jboss | grep -v grep |
> awk '{ print $2 }'
> 27390
> [root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 >
> your_engine_thread_dump_1.txt
> [root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 >
> your_engine_thread_dump_2.txt
> [root@engine-aa-1-01 ovirt-engine]# jstack -F 27390 >
> your_engine_thread_dump_3.txt
>
> Regards
>
> Nar
>
> On Thu, 6 Aug 2020 at 15:55, Artur Socha  wrote:
>
>> Sure thing.
>> On engine host please find  jboss pid. You can use this command:
>>
>>  ps -ef | grep jboss | grep -v grep | awk '{ print $2 }'
>>
>> or jps tool from jdk. Sample output on my dev environment is:
>>
>> ± % jps
>>  !2860
>> 64853 jboss-modules.jar
>> 196217 Jps
>>
>> Then use jstack from jdk:
>> jstack   > your_engine_thread_dump.txt
>> 2 or 3 dumps taken in approximately 5 minutes intervals would be even
>> more useful.
>>
>> Here you can find even more options
>> https://www.baeldung.com/java-thread-dump
>>
>> Artur
>>
>> On Thu, Aug 6, 2020 at 3:15 PM Nardus Geldenhuys 
>> wrote:
>>
>>> Hi
>>>
>>> Can create thread dump, please send details on howto.
>>>
>>> Regards
>>>
>>> Nardus
>>>
>>> On Thu, 6 Aug 2020 at 14:17, Artur Socha  wrote:
>>>
 Hi Nardus,
 You might have hit an issue I have been hunting for some time ( [1]
 and  [2] ).
 [1] could not be properly resolved because at a time was not able to
 recreate an issue on dev setup.
 I suspect [2] is related.

 Would you be able to prepare a thread dump from your engine instance?
 Additionally, please check for potential libvirt errors/warnings.
 Can you also paste the output of:
 sudo yum list installed | grep vdsm
 sudo yum list installed | grep ovirt-engine
 sudo yum list installed | grep libvirt

 Usually, according to previous reports, restarting the engine helps to
 restore connectivity with hosts ... at least for some time.

 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1845152
 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1846338

 regards,
 Artur



 On Thu, Aug 6, 2020 at 8:01 AM Nardus Geldenhuys 
 wrote:

> Also see this in engine:
>
> Aug 6, 2020, 7:37:17 AM
> VDSM someserver command Get Host Capabilities failed: Message timeout
> which can be caused by communication issues
>
> On Thu, 6 Aug 2020 at 07:09, Strahil Nikolov 
> wrote:
>
>> Can you fheck for errors on the affected host. Most probably you need
>> the vdsm logs.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> На 6 август 2020 г. 7:40:23 GMT+03:00, Nardus Geldenhuys <
>> nard...@gmail.com> написа:
>> >Hi Strahil
>> >
>> >Hope you are well. I get the following error when I tried to confirm
>> >reboot:
>> >
>> >Error while executing action: Cannot confirm 'Host has been rebooted'
>> >Host.
>> >Valid Host statuses are "Non operational", "Maintenance" or
>> >"Connecting".
>> >
>> >And I can't put it in maintenance, only option is "restart" or
>> "stop".
>> >
>> >Regards
>> >
>> >Nar
>> >
>> >On Thu, 6 Aug 2020 at 06:16, Strahil Nikolov 
>> >wrote:
>> >
>> >> After rebooting the node, have you "marked" it that it was
>> rebooted ?
>> >>
>> >> Best Regards,
>> >> Strahil Nikolov
>> >>
>> >> На 5 август 2020 г. 21:29:04 GMT+03:00, Nardus Geldenhuys <
>> >> nard...@gmail.com> написа:
>> >> >Hi oVirt land
>> >> >
>> >> >Hope you are well. Got a bit of an issue, actually a big issue. We
>> >had
>> >> >some
>> >> >sort of dip of some sort. All the VM's is still running, but some
>> of
>> >> >the
>> >> >hosts is show "Unassigned" or "NonResponsive". So all the hosts
>> was
>> >> 

[ovirt-users] Re: Unassigned hosts

2020-08-06 Thread Artur Socha
Sure thing.
On engine host please find  jboss pid. You can use this command:

 ps -ef | grep jboss | grep -v grep | awk '{ print $2 }'

or jps tool from jdk. Sample output on my dev environment is:

± % jps
   !2860
64853 jboss-modules.jar
196217 Jps

Then use jstack from jdk:
jstack   > your_engine_thread_dump.txt
2 or 3 dumps taken in approximately 5 minutes intervals would be even more
useful.

Here you can find even more options
https://www.baeldung.com/java-thread-dump

Artur

On Thu, Aug 6, 2020 at 3:15 PM Nardus Geldenhuys  wrote:

> Hi
>
> Can create thread dump, please send details on howto.
>
> Regards
>
> Nardus
>
> On Thu, 6 Aug 2020 at 14:17, Artur Socha  wrote:
>
>> Hi Nardus,
>> You might have hit an issue I have been hunting for some time ( [1] and
>> [2] ).
>> [1] could not be properly resolved because at a time was not able to
>> recreate an issue on dev setup.
>> I suspect [2] is related.
>>
>> Would you be able to prepare a thread dump from your engine instance?
>> Additionally, please check for potential libvirt errors/warnings.
>> Can you also paste the output of:
>> sudo yum list installed | grep vdsm
>> sudo yum list installed | grep ovirt-engine
>> sudo yum list installed | grep libvirt
>>
>> Usually, according to previous reports, restarting the engine helps to
>> restore connectivity with hosts ... at least for some time.
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1845152
>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1846338
>>
>> regards,
>> Artur
>>
>>
>>
>> On Thu, Aug 6, 2020 at 8:01 AM Nardus Geldenhuys 
>> wrote:
>>
>>> Also see this in engine:
>>>
>>> Aug 6, 2020, 7:37:17 AM
>>> VDSM someserver command Get Host Capabilities failed: Message timeout
>>> which can be caused by communication issues
>>>
>>> On Thu, 6 Aug 2020 at 07:09, Strahil Nikolov 
>>> wrote:
>>>
 Can you fheck for errors on the affected host. Most probably you need
 the vdsm logs.

 Best Regards,
 Strahil Nikolov

 На 6 август 2020 г. 7:40:23 GMT+03:00, Nardus Geldenhuys <
 nard...@gmail.com> написа:
 >Hi Strahil
 >
 >Hope you are well. I get the following error when I tried to confirm
 >reboot:
 >
 >Error while executing action: Cannot confirm 'Host has been rebooted'
 >Host.
 >Valid Host statuses are "Non operational", "Maintenance" or
 >"Connecting".
 >
 >And I can't put it in maintenance, only option is "restart" or "stop".
 >
 >Regards
 >
 >Nar
 >
 >On Thu, 6 Aug 2020 at 06:16, Strahil Nikolov 
 >wrote:
 >
 >> After rebooting the node, have you "marked" it that it was rebooted ?
 >>
 >> Best Regards,
 >> Strahil Nikolov
 >>
 >> На 5 август 2020 г. 21:29:04 GMT+03:00, Nardus Geldenhuys <
 >> nard...@gmail.com> написа:
 >> >Hi oVirt land
 >> >
 >> >Hope you are well. Got a bit of an issue, actually a big issue. We
 >had
 >> >some
 >> >sort of dip of some sort. All the VM's is still running, but some of
 >> >the
 >> >hosts is show "Unassigned" or "NonResponsive". So all the hosts was
 >> >showing
 >> >UP and was fine before our dip. So I did increase
 >vdsHeartbeatInSecond
 >> >to
 >> >240, no luck.
 >> >
 >> >I still get a timeout on the engine lock even thou I can connect to
 >> >that
 >> >host from the engine using nc to test to port 54321. I also did
 >restart
 >> >vdsmd and also rebooted the host with no luck.
 >> >
 >> > nc -v someserver 54321
 >> >Ncat: Version 7.50 ( https://nmap.org/ncat )
 >> >Ncat: Connected to 172.40.2.172:54321.
 >> >
 >> >2020-08-05 20:20:34,256+02 ERROR
 >>
 >>[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
 >> >(EE-ManagedThreadFactory-engineScheduled-Thread-70) [] EVENT_ID:
 >> >VDS_BROKER_COMMAND_FAILURE(10,802), VDSM someserver command Get Host
 >> >Capabilities failed: Message timeout which can be caused by
 >> >communication
 >> >issues
 >> >
 >> >Any troubleshoot ideas will be gladly appreciated.
 >> >
 >> >Regards
 >> >
 >> >Nar
 >>

>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/C4HB2J3MH76FI2325Z4AV4VCCEKH4M3S/
>>>
>>
>>
>> --
>> Artur Socha
>> Senior Software Engineer, RHV
>> Red Hat
>>
>

-- 
Artur Socha
Senior Software Engineer, RHV
Red Hat
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt 

[ovirt-users] Re: Unassigned hosts

2020-08-06 Thread Nardus Geldenhuys
Hi

Can create thread dump, please send details on howto.

Regards

Nardus

On Thu, 6 Aug 2020 at 14:17, Artur Socha  wrote:

> Hi Nardus,
> You might have hit an issue I have been hunting for some time ( [1] and
> [2] ).
> [1] could not be properly resolved because at a time was not able to
> recreate an issue on dev setup.
> I suspect [2] is related.
>
> Would you be able to prepare a thread dump from your engine instance?
> Additionally, please check for potential libvirt errors/warnings.
> Can you also paste the output of:
> sudo yum list installed | grep vdsm
> sudo yum list installed | grep ovirt-engine
> sudo yum list installed | grep libvirt
>
> Usually, according to previous reports, restarting the engine helps to
> restore connectivity with hosts ... at least for some time.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1845152
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1846338
>
> regards,
> Artur
>
>
>
> On Thu, Aug 6, 2020 at 8:01 AM Nardus Geldenhuys 
> wrote:
>
>> Also see this in engine:
>>
>> Aug 6, 2020, 7:37:17 AM
>> VDSM someserver command Get Host Capabilities failed: Message timeout
>> which can be caused by communication issues
>>
>> On Thu, 6 Aug 2020 at 07:09, Strahil Nikolov 
>> wrote:
>>
>>> Can you fheck for errors on the affected host. Most probably you need
>>> the vdsm logs.
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>>
>>> На 6 август 2020 г. 7:40:23 GMT+03:00, Nardus Geldenhuys <
>>> nard...@gmail.com> написа:
>>> >Hi Strahil
>>> >
>>> >Hope you are well. I get the following error when I tried to confirm
>>> >reboot:
>>> >
>>> >Error while executing action: Cannot confirm 'Host has been rebooted'
>>> >Host.
>>> >Valid Host statuses are "Non operational", "Maintenance" or
>>> >"Connecting".
>>> >
>>> >And I can't put it in maintenance, only option is "restart" or "stop".
>>> >
>>> >Regards
>>> >
>>> >Nar
>>> >
>>> >On Thu, 6 Aug 2020 at 06:16, Strahil Nikolov 
>>> >wrote:
>>> >
>>> >> After rebooting the node, have you "marked" it that it was rebooted ?
>>> >>
>>> >> Best Regards,
>>> >> Strahil Nikolov
>>> >>
>>> >> На 5 август 2020 г. 21:29:04 GMT+03:00, Nardus Geldenhuys <
>>> >> nard...@gmail.com> написа:
>>> >> >Hi oVirt land
>>> >> >
>>> >> >Hope you are well. Got a bit of an issue, actually a big issue. We
>>> >had
>>> >> >some
>>> >> >sort of dip of some sort. All the VM's is still running, but some of
>>> >> >the
>>> >> >hosts is show "Unassigned" or "NonResponsive". So all the hosts was
>>> >> >showing
>>> >> >UP and was fine before our dip. So I did increase
>>> >vdsHeartbeatInSecond
>>> >> >to
>>> >> >240, no luck.
>>> >> >
>>> >> >I still get a timeout on the engine lock even thou I can connect to
>>> >> >that
>>> >> >host from the engine using nc to test to port 54321. I also did
>>> >restart
>>> >> >vdsmd and also rebooted the host with no luck.
>>> >> >
>>> >> > nc -v someserver 54321
>>> >> >Ncat: Version 7.50 ( https://nmap.org/ncat )
>>> >> >Ncat: Connected to 172.40.2.172:54321.
>>> >> >
>>> >> >2020-08-05 20:20:34,256+02 ERROR
>>> >>
>>> >>[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>> >> >(EE-ManagedThreadFactory-engineScheduled-Thread-70) [] EVENT_ID:
>>> >> >VDS_BROKER_COMMAND_FAILURE(10,802), VDSM someserver command Get Host
>>> >> >Capabilities failed: Message timeout which can be caused by
>>> >> >communication
>>> >> >issues
>>> >> >
>>> >> >Any troubleshoot ideas will be gladly appreciated.
>>> >> >
>>> >> >Regards
>>> >> >
>>> >> >Nar
>>> >>
>>>
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/C4HB2J3MH76FI2325Z4AV4VCCEKH4M3S/
>>
>
>
> --
> Artur Socha
> Senior Software Engineer, RHV
> Red Hat
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YYUSMPZHIKV57X3L44ODZV47IMFQEVZE/


[ovirt-users] Re: Unassigned hosts

2020-08-06 Thread Nardus Geldenhuys
Hi

[root@engine-aa-1-01 ovirt-engine]# sudo yum list installed | grep vdsm
vdsm-jsonrpc-java.noarch   1.4.18-1.el7
@ovirt-4.3
[root@engine-aa-1-01 ovirt-engine]# sudo yum list installed | grep vdsm
vdsm-jsonrpc-java.noarch   1.4.18-1.el7
@ovirt-4.3
[root@engine-aa-1-01 ovirt-engine]# sudo yum list installed | grep
ovirt-engine
ovirt-engine.noarch4.3.6.7-1.el7
 @ovirt-4.3
ovirt-engine-api-explorer.noarch   0.0.5-1.el7
 @ovirt-4.3
ovirt-engine-backend.noarch4.3.6.7-1.el7
 @ovirt-4.3
ovirt-engine-dbscripts.noarch  4.3.6.7-1.el7
 @ovirt-4.3
ovirt-engine-dwh.noarch4.3.6-1.el7
 @ovirt-4.3
ovirt-engine-dwh-setup.noarch  4.3.6-1.el7
 @ovirt-4.3
ovirt-engine-extension-aaa-jdbc.noarch 1.1.10-1.el7
@ovirt-4.3
ovirt-engine-extension-aaa-ldap.noarch 1.3.10-1.el7
@ovirt-4.3
ovirt-engine-extension-aaa-ldap-setup.noarch
ovirt-engine-extensions-api-impl.noarch4.3.6.7-1.el7
 @ovirt-4.3
ovirt-engine-metrics.noarch1.3.4.1-1.el7
 @ovirt-4.3
ovirt-engine-restapi.noarch4.3.6.7-1.el7
 @ovirt-4.3
ovirt-engine-setup.noarch  4.3.6.7-1.el7
 @ovirt-4.3
ovirt-engine-setup-base.noarch 4.3.6.7-1.el7
 @ovirt-4.3
ovirt-engine-setup-plugin-cinderlib.noarch 4.3.6.7-1.el7
 @ovirt-4.3
ovirt-engine-setup-plugin-ovirt-engine.noarch
ovirt-engine-setup-plugin-ovirt-engine-common.noarch
ovirt-engine-setup-plugin-vmconsole-proxy-helper.noarch
ovirt-engine-setup-plugin-websocket-proxy.noarch
ovirt-engine-tools.noarch  4.3.6.7-1.el7
 @ovirt-4.3
ovirt-engine-tools-backup.noarch   4.3.6.7-1.el7
 @ovirt-4.3
ovirt-engine-ui-extensions.noarch  1.0.10-1.el7
@ovirt-4.3
ovirt-engine-vmconsole-proxy-helper.noarch 4.3.6.7-1.el7
 @ovirt-4.3
ovirt-engine-webadmin-portal.noarch4.3.6.7-1.el7
 @ovirt-4.3
ovirt-engine-websocket-proxy.noarch4.3.6.7-1.el7
 @ovirt-4.3
ovirt-engine-wildfly.x86_6417.0.1-1.el7
@ovirt-4.3
ovirt-engine-wildfly-overlay.noarch17.0.1-1.el7
@ovirt-4.3
python-ovirt-engine-sdk4.x86_644.3.2-2.el7
 @ovirt-4.3
python2-ovirt-engine-lib.noarch4.3.6.7-1.el7
 @ovirt-4.3
[root@engine-aa-1-01 ovirt-engine]# sudo yum list installed | grep libvirt
[root@engine-aa-1-01 ovirt-engine]#

I can send more info if needed. And yes, it looks like sometimes it helps
if you restart the engine.

Regards

Nardus

On Thu, 6 Aug 2020 at 14:17, Artur Socha  wrote:

> Hi Nardus,
> You might have hit an issue I have been hunting for some time ( [1] and
> [2] ).
> [1] could not be properly resolved because at a time was not able to
> recreate an issue on dev setup.
> I suspect [2] is related.
>
> Would you be able to prepare a thread dump from your engine instance?
> Additionally, please check for potential libvirt errors/warnings.
> Can you also paste the output of:
> sudo yum list installed | grep vdsm
> sudo yum list installed | grep ovirt-engine
> sudo yum list installed | grep libvirt
>
> Usually, according to previous reports, restarting the engine helps to
> restore connectivity with hosts ... at least for some time.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1845152
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1846338
>
> regards,
> Artur
>
>
>
> On Thu, Aug 6, 2020 at 8:01 AM Nardus Geldenhuys 
> wrote:
>
>> Also see this in engine:
>>
>> Aug 6, 2020, 7:37:17 AM
>> VDSM someserver command Get Host Capabilities failed: Message timeout
>> which can be caused by communication issues
>>
>> On Thu, 6 Aug 2020 at 07:09, Strahil Nikolov 
>> wrote:
>>
>>> Can you fheck for errors on the affected host. Most probably you need
>>> the vdsm logs.
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>>
>>> На 6 август 2020 г. 7:40:23 GMT+03:00, Nardus Geldenhuys <
>>> nard...@gmail.com> написа:
>>> >Hi Strahil
>>> >
>>> >Hope you are well. I get the following error when I tried to confirm
>>> >reboot:
>>> >
>>> >Error while executing action: Cannot confirm 'Host has been rebooted'
>>> >Host.
>>> >Valid Host statuses are "Non operational", "Maintenance" or
>>> >"Connecting".
>>> >
>>> >And I can't put it in maintenance, only option is "restart" or "stop".
>>> >
>>> >Regards
>>> >
>>> >Nar
>>> >
>>> >On Thu, 6 Aug 2020 at 06:16, Strahil Nikolov 
>>> >wrote:
>>> >
>>> >> After rebooting the node, have you "marked" it that it was rebooted ?
>>> >>
>>> >> Best Regards,
>>> >> Strahil Nikolov
>>> >>
>>> >> На 5 август 2020 г. 21:29:04 GMT+03:00, Nardus Geldenhuys <
>>> >> nard...@gmail.com> написа:
>>> >> >Hi oVirt land
>>> >> >
>>> >> >Hope you are well. Got a bit of an issue, actually a big issue. We
>>> >had
>>> >> >some
>>> >> >sort of dip of some sort. All the VM's is still running, but some of
>>> >> >the
>>> >> >hosts is show "Unassigned" or "NonResponsive". So all the hosts was
>>> >> >showing
>>> >> >UP and was fine before our dip. So I did increase
>>> >vdsHeartbeatInSecond
>>> >> >to

[ovirt-users] Re: Unassigned hosts

2020-08-06 Thread Artur Socha
Hi Nardus,
You might have hit an issue I have been hunting for some time ( [1] and
[2] ).
[1] could not be properly resolved because at a time was not able to
recreate an issue on dev setup.
I suspect [2] is related.

Would you be able to prepare a thread dump from your engine instance?
Additionally, please check for potential libvirt errors/warnings.
Can you also paste the output of:
sudo yum list installed | grep vdsm
sudo yum list installed | grep ovirt-engine
sudo yum list installed | grep libvirt

Usually, according to previous reports, restarting the engine helps to
restore connectivity with hosts ... at least for some time.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1845152
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1846338

regards,
Artur



On Thu, Aug 6, 2020 at 8:01 AM Nardus Geldenhuys  wrote:

> Also see this in engine:
>
> Aug 6, 2020, 7:37:17 AM
> VDSM someserver command Get Host Capabilities failed: Message timeout
> which can be caused by communication issues
>
> On Thu, 6 Aug 2020 at 07:09, Strahil Nikolov 
> wrote:
>
>> Can you fheck for errors on the affected host. Most probably you need the
>> vdsm logs.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> На 6 август 2020 г. 7:40:23 GMT+03:00, Nardus Geldenhuys <
>> nard...@gmail.com> написа:
>> >Hi Strahil
>> >
>> >Hope you are well. I get the following error when I tried to confirm
>> >reboot:
>> >
>> >Error while executing action: Cannot confirm 'Host has been rebooted'
>> >Host.
>> >Valid Host statuses are "Non operational", "Maintenance" or
>> >"Connecting".
>> >
>> >And I can't put it in maintenance, only option is "restart" or "stop".
>> >
>> >Regards
>> >
>> >Nar
>> >
>> >On Thu, 6 Aug 2020 at 06:16, Strahil Nikolov 
>> >wrote:
>> >
>> >> After rebooting the node, have you "marked" it that it was rebooted ?
>> >>
>> >> Best Regards,
>> >> Strahil Nikolov
>> >>
>> >> На 5 август 2020 г. 21:29:04 GMT+03:00, Nardus Geldenhuys <
>> >> nard...@gmail.com> написа:
>> >> >Hi oVirt land
>> >> >
>> >> >Hope you are well. Got a bit of an issue, actually a big issue. We
>> >had
>> >> >some
>> >> >sort of dip of some sort. All the VM's is still running, but some of
>> >> >the
>> >> >hosts is show "Unassigned" or "NonResponsive". So all the hosts was
>> >> >showing
>> >> >UP and was fine before our dip. So I did increase
>> >vdsHeartbeatInSecond
>> >> >to
>> >> >240, no luck.
>> >> >
>> >> >I still get a timeout on the engine lock even thou I can connect to
>> >> >that
>> >> >host from the engine using nc to test to port 54321. I also did
>> >restart
>> >> >vdsmd and also rebooted the host with no luck.
>> >> >
>> >> > nc -v someserver 54321
>> >> >Ncat: Version 7.50 ( https://nmap.org/ncat )
>> >> >Ncat: Connected to 172.40.2.172:54321.
>> >> >
>> >> >2020-08-05 20:20:34,256+02 ERROR
>> >>
>> >>[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> >> >(EE-ManagedThreadFactory-engineScheduled-Thread-70) [] EVENT_ID:
>> >> >VDS_BROKER_COMMAND_FAILURE(10,802), VDSM someserver command Get Host
>> >> >Capabilities failed: Message timeout which can be caused by
>> >> >communication
>> >> >issues
>> >> >
>> >> >Any troubleshoot ideas will be gladly appreciated.
>> >> >
>> >> >Regards
>> >> >
>> >> >Nar
>> >>
>>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/C4HB2J3MH76FI2325Z4AV4VCCEKH4M3S/
>


-- 
Artur Socha
Senior Software Engineer, RHV
Red Hat
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RZPEGTZ6WD35MMSHF357RQI34E66N7MB/


[ovirt-users] Re: Unassigned hosts

2020-08-06 Thread Nardus Geldenhuys
Also see this in engine:

Aug 6, 2020, 7:37:17 AM
VDSM someserver command Get Host Capabilities failed: Message timeout which
can be caused by communication issues

On Thu, 6 Aug 2020 at 07:09, Strahil Nikolov  wrote:

> Can you fheck for errors on the affected host. Most probably you need the
> vdsm logs.
>
> Best Regards,
> Strahil Nikolov
>
> На 6 август 2020 г. 7:40:23 GMT+03:00, Nardus Geldenhuys <
> nard...@gmail.com> написа:
> >Hi Strahil
> >
> >Hope you are well. I get the following error when I tried to confirm
> >reboot:
> >
> >Error while executing action: Cannot confirm 'Host has been rebooted'
> >Host.
> >Valid Host statuses are "Non operational", "Maintenance" or
> >"Connecting".
> >
> >And I can't put it in maintenance, only option is "restart" or "stop".
> >
> >Regards
> >
> >Nar
> >
> >On Thu, 6 Aug 2020 at 06:16, Strahil Nikolov 
> >wrote:
> >
> >> After rebooting the node, have you "marked" it that it was rebooted ?
> >>
> >> Best Regards,
> >> Strahil Nikolov
> >>
> >> На 5 август 2020 г. 21:29:04 GMT+03:00, Nardus Geldenhuys <
> >> nard...@gmail.com> написа:
> >> >Hi oVirt land
> >> >
> >> >Hope you are well. Got a bit of an issue, actually a big issue. We
> >had
> >> >some
> >> >sort of dip of some sort. All the VM's is still running, but some of
> >> >the
> >> >hosts is show "Unassigned" or "NonResponsive". So all the hosts was
> >> >showing
> >> >UP and was fine before our dip. So I did increase
> >vdsHeartbeatInSecond
> >> >to
> >> >240, no luck.
> >> >
> >> >I still get a timeout on the engine lock even thou I can connect to
> >> >that
> >> >host from the engine using nc to test to port 54321. I also did
> >restart
> >> >vdsmd and also rebooted the host with no luck.
> >> >
> >> > nc -v someserver 54321
> >> >Ncat: Version 7.50 ( https://nmap.org/ncat )
> >> >Ncat: Connected to 172.40.2.172:54321.
> >> >
> >> >2020-08-05 20:20:34,256+02 ERROR
> >>
> >>[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> >> >(EE-ManagedThreadFactory-engineScheduled-Thread-70) [] EVENT_ID:
> >> >VDS_BROKER_COMMAND_FAILURE(10,802), VDSM someserver command Get Host
> >> >Capabilities failed: Message timeout which can be caused by
> >> >communication
> >> >issues
> >> >
> >> >Any troubleshoot ideas will be gladly appreciated.
> >> >
> >> >Regards
> >> >
> >> >Nar
> >>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/C4HB2J3MH76FI2325Z4AV4VCCEKH4M3S/


[ovirt-users] Re: Unassigned hosts

2020-08-05 Thread Nardus Geldenhuys
Restarted vdsmd on host:

mom.log:
2020-08-06 07:21:19,053 - mom.GuestManager - INFO - Guest Manager ending
2020-08-06 07:21:20,483 - mom.HostMonitor - INFO - Host Monitor ending
2020-08-06 07:21:24,795 - mom - INFO - MOM starting
2020-08-06 07:21:24,833 - mom - INFO - hypervisor interface
vdsmjsonrpcclient
2020-08-06 07:21:24,833 - mom.HostMonitor - INFO - Host Monitor starting
2020-08-06 07:21:24,880 - mom - ERROR - Failed to initialize MOM threads
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 29, in run
hypervisor_iface = self.get_hypervisor_interface()
  File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 217, in
get_hypervisor_interface
return module.instance(self.config)
  File
"/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcclientInterface.py",
line 96, in instance
return JsonRpcVdsmClientInterface()
  File
"/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcclientInterface.py",
line 31, in __init__
self._vdsm_api = client.connect(host="localhost")
  File "/usr/lib/python2.7/site-packages/vdsm/client.py", line 157, in
connect
raise ConnectionError(host, port, use_tls, timeout, e)
ConnectionError: Connection to localhost:54321 with use_tls=True,
timeout=60 failed: [Errno 111] Connection refused
2020-08-06 07:21:30,085 - mom - INFO - MOM starting
2020-08-06 07:21:30,122 - mom.HostMonitor - INFO - Host Monitor starting
2020-08-06 07:21:30,123 - mom - INFO - hypervisor interface
vdsmjsonrpcclient
2020-08-06 07:21:30,217 - mom.HostMonitor - INFO - HostMonitor is ready
2020-08-06 07:21:30,221 - mom.GuestManager - INFO - Guest Manager starting:
multi-thread
2020-08-06 07:21:30,226 - mom.Policy - INFO - Loaded policy '00-defines'
2020-08-06 07:21:30,228 - mom.Policy - INFO - Loaded policy '01-parameters'
2020-08-06 07:21:30,241 - mom.Policy - INFO - Loaded policy '02-balloon'
2020-08-06 07:21:30,263 - mom.Policy - INFO - Loaded policy '03-ksm'
2020-08-06 07:21:30,290 - mom.Policy - INFO - Loaded policy '04-cputune'
2020-08-06 07:21:30,321 - mom.Policy - INFO - Loaded policy '05-iotune'
2020-08-06 07:21:30,321 - mom.PolicyEngine - INFO - Policy Engine starting
2020-08-06 07:21:30,322 - mom.RPCServer - INFO - Using unix socket
/var/run/vdsm/mom-vdsm.sock
2020-08-06 07:21:30,323 - mom.RPCServer - INFO - RPC Server starting
2020-08-06 07:21:40,692 - mom.RPCServer - INFO - ping()
2020-08-06 07:21:40,692 - mom.RPCServer - INFO - getStatistics()
2020-08-06 07:21:45,356 - mom.Controllers.KSM - INFO - Updating KSM
configuration: pages_to_scan:0 merge_across_nodes:1 run:0 sleep_millisecs:0
2020-08-06 07:21:55,838 - mom.RPCServer - INFO - ping()
2020-08-06 07:21:55,839 - mom.RPCServer - INFO - getStatistics()

supervdsm.log:
MainProcess|jsonrpc/3::DEBUG::2020-08-05
20:11:14,139::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper)
call ksmTune with ({u'run': 0, u'merge_across_nodes': 1},) {}
MainProcess|jsonrpc/3::DEBUG::2020-08-05
20:11:14,139::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper)
return ksmTune with None
MainProcess::DEBUG::2020-08-06
07:21:25,279::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper)
call multipath_status with (,) {}
MainProcess::DEBUG::2020-08-06
07:21:25,279::logutils::319::root::(_report_stats) ThreadedHandler is ok in
the last 40234 seconds (max pending: 3)
MainProcess::DEBUG::2020-08-06
07:21:25,279::commands::198::storage.Misc.excCmd::(execCmd)
/usr/bin/taskset --cpu-list 0-95 /usr/sbin/dmsetup status --target
multipath (cwd None)
MainProcess::DEBUG::2020-08-06
07:21:25,283::commands::219::storage.Misc.excCmd::(execCmd) SUCCESS: 
= '';  = 0
MainProcess::DEBUG::2020-08-06
07:21:25,289::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper)
return multipath_status with {u'T1_58886_2121': [PathStatus(name=u'sdd',
status=u'A'), PathStatus(name=u'sdm', status=u'A')],
u'T0_someserver_boot_58886_20c2': [PathStatus(name=u'sdi', status=u'A'),
PathStatus(name=u'sdr', status=u'A')], u'T0_R4_UCS_MOB1P_DIGIT_58886_20c8':
[PathStatus(name=u'sdg', status=u'A'), PathStatus(name=u'sdp',
status=u'A')], u'T0_58886_215d': [PathStatus(name=u'sdb', status=u'A'),
PathStatus(name=u'sdk', status=u'A')], u'T0_R4_UCS_MOB1P_DIGIT_58886_20c7':
[PathStatus(name=u'sdf', status=u'A'), PathStatus(name=u'sdo',
status=u'A')], u'T0_58886_20b8': [PathStatus(name=u'sde', status=u'A'),
PathStatus(name=u'sdn', status=u'A')], u'T0_58886_208a':
[PathStatus(name=u'sdc', status=u'A'), PathStatus(name=u'sdl',
status=u'A')], u'T0_58886_2124': [PathStatus(name=u'sdh', status=u'A'),
PathStatus(name=u'sdq', status=u'A')], u'T0_58886_215c':
[PathStatus(name=u'sda', status=u'A'), PathStatus(name=u'sdj',
status=u'A')]}
MainProcess|hsm/init::DEBUG::2020-08-06
07:21:25,383::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper)
call hbaRescan with (,) {}
MainProcess|hsm/init::DEBUG::2020-08-06
07:21:25,384::commands::198::storage.HBA::(execCmd) /usr/bin/taskset
--cpu-list 0-95 

[ovirt-users] Re: Unassigned hosts

2020-08-05 Thread Strahil Nikolov via Users
Can you fheck for errors on the affected host. Most probably you need the vdsm 
logs.

Best Regards,
Strahil Nikolov

На 6 август 2020 г. 7:40:23 GMT+03:00, Nardus Geldenhuys  
написа:
>Hi Strahil
>
>Hope you are well. I get the following error when I tried to confirm
>reboot:
>
>Error while executing action: Cannot confirm 'Host has been rebooted'
>Host.
>Valid Host statuses are "Non operational", "Maintenance" or
>"Connecting".
>
>And I can't put it in maintenance, only option is "restart" or "stop".
>
>Regards
>
>Nar
>
>On Thu, 6 Aug 2020 at 06:16, Strahil Nikolov 
>wrote:
>
>> After rebooting the node, have you "marked" it that it was rebooted ?
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> На 5 август 2020 г. 21:29:04 GMT+03:00, Nardus Geldenhuys <
>> nard...@gmail.com> написа:
>> >Hi oVirt land
>> >
>> >Hope you are well. Got a bit of an issue, actually a big issue. We
>had
>> >some
>> >sort of dip of some sort. All the VM's is still running, but some of
>> >the
>> >hosts is show "Unassigned" or "NonResponsive". So all the hosts was
>> >showing
>> >UP and was fine before our dip. So I did increase 
>vdsHeartbeatInSecond
>> >to
>> >240, no luck.
>> >
>> >I still get a timeout on the engine lock even thou I can connect to
>> >that
>> >host from the engine using nc to test to port 54321. I also did
>restart
>> >vdsmd and also rebooted the host with no luck.
>> >
>> > nc -v someserver 54321
>> >Ncat: Version 7.50 ( https://nmap.org/ncat )
>> >Ncat: Connected to 172.40.2.172:54321.
>> >
>> >2020-08-05 20:20:34,256+02 ERROR
>>
>>[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> >(EE-ManagedThreadFactory-engineScheduled-Thread-70) [] EVENT_ID:
>> >VDS_BROKER_COMMAND_FAILURE(10,802), VDSM someserver command Get Host
>> >Capabilities failed: Message timeout which can be caused by
>> >communication
>> >issues
>> >
>> >Any troubleshoot ideas will be gladly appreciated.
>> >
>> >Regards
>> >
>> >Nar
>>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/T73Q4GMQ6JZSBCXYGCEQGI3HOC4SSFNG/


[ovirt-users] Re: Unassigned hosts

2020-08-05 Thread Nardus Geldenhuys
Hi Strahil

Hope you are well. I get the following error when I tried to confirm reboot:

Error while executing action: Cannot confirm 'Host has been rebooted' Host.
Valid Host statuses are "Non operational", "Maintenance" or "Connecting".

And I can't put it in maintenance, only option is "restart" or "stop".

Regards

Nar

On Thu, 6 Aug 2020 at 06:16, Strahil Nikolov  wrote:

> After rebooting the node, have you "marked" it that it was rebooted ?
>
> Best Regards,
> Strahil Nikolov
>
> На 5 август 2020 г. 21:29:04 GMT+03:00, Nardus Geldenhuys <
> nard...@gmail.com> написа:
> >Hi oVirt land
> >
> >Hope you are well. Got a bit of an issue, actually a big issue. We had
> >some
> >sort of dip of some sort. All the VM's is still running, but some of
> >the
> >hosts is show "Unassigned" or "NonResponsive". So all the hosts was
> >showing
> >UP and was fine before our dip. So I did increase  vdsHeartbeatInSecond
> >to
> >240, no luck.
> >
> >I still get a timeout on the engine lock even thou I can connect to
> >that
> >host from the engine using nc to test to port 54321. I also did restart
> >vdsmd and also rebooted the host with no luck.
> >
> > nc -v someserver 54321
> >Ncat: Version 7.50 ( https://nmap.org/ncat )
> >Ncat: Connected to 172.40.2.172:54321.
> >
> >2020-08-05 20:20:34,256+02 ERROR
> >[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> >(EE-ManagedThreadFactory-engineScheduled-Thread-70) [] EVENT_ID:
> >VDS_BROKER_COMMAND_FAILURE(10,802), VDSM someserver command Get Host
> >Capabilities failed: Message timeout which can be caused by
> >communication
> >issues
> >
> >Any troubleshoot ideas will be gladly appreciated.
> >
> >Regards
> >
> >Nar
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NMQDDIGOIV7EDXO3EFHVU3ROKU44Y6ZY/


[ovirt-users] Re: Unassigned hosts

2020-08-05 Thread Strahil Nikolov via Users
After rebooting the node, have you "marked" it that it was rebooted ?

Best Regards,
Strahil Nikolov

На 5 август 2020 г. 21:29:04 GMT+03:00, Nardus Geldenhuys  
написа:
>Hi oVirt land
>
>Hope you are well. Got a bit of an issue, actually a big issue. We had
>some
>sort of dip of some sort. All the VM's is still running, but some of
>the
>hosts is show "Unassigned" or "NonResponsive". So all the hosts was
>showing
>UP and was fine before our dip. So I did increase  vdsHeartbeatInSecond
>to
>240, no luck.
>
>I still get a timeout on the engine lock even thou I can connect to
>that
>host from the engine using nc to test to port 54321. I also did restart
>vdsmd and also rebooted the host with no luck.
>
> nc -v someserver 54321
>Ncat: Version 7.50 ( https://nmap.org/ncat )
>Ncat: Connected to 172.40.2.172:54321.
>
>2020-08-05 20:20:34,256+02 ERROR
>[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>(EE-ManagedThreadFactory-engineScheduled-Thread-70) [] EVENT_ID:
>VDS_BROKER_COMMAND_FAILURE(10,802), VDSM someserver command Get Host
>Capabilities failed: Message timeout which can be caused by
>communication
>issues
>
>Any troubleshoot ideas will be gladly appreciated.
>
>Regards
>
>Nar
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QDTX76DFTZYON6QMAFICNDY3GJ6TR2UD/