Hi

Should  I use threaddump_linux.sh.tar.gz ?
from:

https://access.redhat.com/solutions/18178


> On 9 Aug 2021, at 17:56, Artur Socha <aso...@redhat.com> wrote:
> 
> Actually you could even make 3 thread dumps in 30second intervals. 
> Artur
> 
> On Mon, Aug 9, 2021 at 4:53 PM Artur Socha <aso...@redhat.com> wrote:
> Unfortunately I don't see anything wrong in both engine and vdsm logs. 
> There is one last thing that comes to my mind that you try - restart engine 
> service. That is exactly the case I have been investigating. 
> But before restarting I would like to ask you, if possible, for a java (jvm) 
> thread dump. 
> The procedure is as follows:
> 1)  find jboss pid  ie.
> $ ps -ef | grep jboss | grep -v grep | awk '{ print $2 }'
> 2) trigger thread dump
> $ kill -3 <jboss-pid>
> 3)  thread dump logs can be found at /var/log/ovirt-engine/console.log
> 
> And then restart engine service to check if that helps.
> 
> Artur
> 
> 
> On Mon, Aug 9, 2021 at 2:19 PM Andrei Verovski <andre...@starlett.lv> wrote:
> Hi, Artur,
> 
> Small update with vdsm status, forgot to include in previous post.
> 
> I partially fixed problem with VDSM start.
> 
> Bug "Failed to create session: Start job for unit user-0.slice failed with 
> ‘canceled’”
> is being described here
> https://bugzilla.redhat.com/show_bug.cgi?id=1967962
> and fix seem to be available here, so I have downgraded systemd with backport 
> fix:
> http://people.redhat.com/dtardon/systemd/bz1642460-backport-UserStopDelaySec=/
> 
> Now vdsmd service starts successfully, but node14 still cannot be activated 
> because of same error. This is quite strange, before restart on Friday node 
> just worked. There were no upgrades, nothing, just restart.
> 
> [root@node14 ~]# service vdsmd status
> Redirecting to /bin/systemctl status vdsmd.service
> ● vdsmd.service - Virtual Desktop Server Manager
>    Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor 
> preset: disabled)
>    Active: active (running) since Mon 2021-08-09 15:12:59 EEST; 4min 20s ago
>   Process: 4066 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh 
> --pre-start (code=exited, status=0/SUCCESS)
>  Main PID: 4130 (vdsmd)
>     Tasks: 41 (limit: 615525)
>    Memory: 59.5M
>    CGroup: /system.slice/vdsmd.service
>            └─4130 /usr/bin/python3 /usr/share/vdsm/vdsmd
> 
> Aug 09 15:12:55 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running 
> prepare_transient_repository
> Aug 09 15:12:57 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running 
> syslog_available
> Aug 09 15:12:57 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running 
> nwfilter
> Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running 
> dummybr
> Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running 
> tune_system
> Aug 09 15:12:58 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running 
> test_space
> Aug 09 15:12:59 node14.***.lv vdsmd_init_common.sh[4066]: vdsm: Running 
> test_lo
> Aug 09 15:12:59 node14.***.lv systemd[1]: Started Virtual Desktop Server 
> Manager.
> Aug 09 15:13:00 node14.***.lv vdsm[4130]: WARN MOM not available. Error: 
> [Errno 111] Connection refused
> Aug 09 15:13:00 node14.***.lv vdsm[4130]: WARN MOM not available, KSM stats 
> will be missing. Error:
> 
> 
> [root@node14]# firewall-cmd --list-all
> public (active)
>   target: default
>   icmp-block-inversion: no
>   interfaces: DMZ_node14 eno1 eno2 ovirtmgmt
>   sources: 
>   services: cockpit dhcpv6-client libvirt-tls mountd nfs ovirt-imageio 
> ovirt-vmconsole rpc-bind snmp ssh vdsm
>   ports: 2301/tcp 2381/tcp 22/tcp 6081/udp
>   protocols: 
>   forward: no
>   masquerade: no
>   forward-ports: 
>   source-ports: 
>   icmp-blocks: 
>   rich rules: 
> [root@node14 andrei]# 
> 
> 
> vdsm-client Host getStats and vdsm-client Host getCapabilities attached.
> 
> 
> 
> 
>> On 9 Aug 2021, at 13:18, Artur Socha <aso...@redhat.com> wrote:
>> 
>> Thanks for the logs.  I am checking them at the moment. I have noticed so 
>> far that node14 is serving NFS share which had been marked as problematic 
>> (probably because of the downtime during the migration) but it has 
>> recovered. 
>> 
>> In the meantime, is is possible to get some meaningful results when  calling:
>> $ vdsm-client Host getStats
>> and 
>> $ vdsm-client Host getCapabilities 
>> on node14?
>> 
>> What  is the state for vdsmd service when running systemctl status vdsmd? 
>> One other thing to rule out is the networking/firewall. Here the list of the 
>> ports to be open for the host (the documentation is for hosted engine, but 
>> it applies for standalone setup as well):  
>> https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine_using_the_command_line/index.html#host-firewall-requirements_SHE_cli_deploy
>> 
>> btw. I have been hunting for the rare and hard to recreate bug for quite a 
>> long time (without success yet) so any reported connectivity issues between 
>> the manager and hosts are super interesting to me. 
>> 
>> Artur
>> 
>> On Mon, Aug 9, 2021 at 11:44 AM Andrei Verovski <andreil1@***.lv> wrote:
>> Hi, Artur,
>> 
>> 
>> Thanks for assistance. Zipped engine starting from the day of upgrade 
>> attached.
>> Restart via SSH from oVirt Web GUI works.
>> oVirt engine runs on dedicated server, not hosted engine.
>> 
>> 
>> 
>> 
>>> On 9 Aug 2021, at 11:24, Artur Socha <aso...@redhat.com> wrote:
>>> 
>>> Hi Andrei,
>>> Could you also post a relevant piece of engine.log? I don't have high 
>>> expectations to find the answer there but  I just want  to be sure of it.
>>> VDSM.log does not show any trace of error from the vdsm point of view. For 
>>> example it looks like it started correctly and subscribed to receiving 
>>> commands from the engine (yet that does not mean I connected to it - only 
>>> in listening mode). 
>>> 
>>> Can you confirm that 'SSH restart' from UI works - by 'works' I mean the 
>>> host is actually restarted after a few minutes and there are no ssh related 
>>> (public key etc) errors in engine.log?
>>> 
>>> Artur
>>> 
>>> On Mon, Aug 9, 2021 at 9:55 AM Andrei Verovski <andreil1@***.lv> wrote:
>>> Hi,
>>> 
>>> I have oVirt 4.4.7.6-1.el8 and one problematic node (HP ProLiant with 
>>> CentOS 8 stream).
>>> After replacing server rack router switch and restart got this error I 
>>> can’t recover from:
>>> 
>>> VDSM node14 command Get Host Capabilities failed: Message timeout which can 
>>> be caused by communication issues
>>> 
>>> vdsm-network running fine, but vdsmd can’t start on node14 for whatever 
>>> reason. All other nodes running fine.
>>> 
>>> Aug 09 10:24:12 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm: 
>>> Running dummybr
>>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm: 
>>> Running tune_system
>>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm: 
>>> Running test_space
>>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm: 
>>> Running test_lo
>>> Aug 09 10:24:13 node14.mydomain.lv systemd[1]: Started Virtual Desktop 
>>> Server Manager.
>>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]: pam_systemd(sudo:session): 
>>> Failed to create session: Start job for unit user-0.slice failed with 
>>> 'canceled'
>>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]: pam_unix(sudo:session): 
>>> session opened for user root by (uid=0)
>>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]: pam_unix(sudo:session): 
>>> session closed for user root
>>> Aug 09 10:24:17 node14.mydomain.lv vdsm[6754]: WARN MOM not available. 
>>> Error: [Errno 2] No such file or directory
>>> Aug 09 10:24:17 node14.mydomain.lv vdsm[6754]: WARN MOM not available, KSM 
>>> stats will be missing. Error:
>>> 
>>> 
>>> In web gui -> Management I can’t do anything with the host except restart. 
>>> Stop aborts with error, all other commands are gray-ed out.
>>> Status is “Unassigned”. Host is answering to pings as usual.
>>> vdsm.log (from node14) attached.
>>> 
>>> Thanks in advance for any help.
>>> 
>>> 
>>> _______________________________________________
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct: 
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives: 
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/55M65W57Z43ZVPOARDTK7HKHCAMAUGO5/
>>> 
>>> 
>>> -- 
>>> Artur Socha
>>> Senior Software Engineer, RHV
>>> Red Hat
>> 
>> 
>> 
>> -- 
>> Artur Socha
>> Senior Software Engineer, RHV
>> Red Hat
> 
> 
> 
> -- 
> Artur Socha
> Senior Software Engineer, RHV
> Red Hat
> 
> 
> -- 
> Artur Socha
> Senior Software Engineer, RHV
> Red Hat
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/U5VAABEJQDFHNXBPUUZSJREYBMRWAEY4/

Reply via email to