[ovirt-users] Re: How to debug "Non Operational" host

2021-11-24 Thread Gervais de Montbrun
rt1.dgi ~]# mount | grep storage
ovirt1-storage.dgi:/engine on 
/rhev/data-center/mnt/glusterSD/ovirt1-storage.dgi:_engine type fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

I've noticed that on my working servers, I see this output:
[r...@ovirt2.dgi ~]# mount | grep storage
ovirt1-storage.dgi:/engine on 
/rhev/data-center/mnt/glusterSD/ovirt1-storage.dgi:_engine type fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
ovirt1-storage.dgi:/vmstore on 
/rhev/data-center/mnt/glusterSD/ovirt1-storage.dgi:_vmstore type fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

There is obviously something not mounting properly on ovirt1. I don't see how 
this can be a network issue as the storage for the hosted engine is working ok.

I truly appreciate the help. Any other ideas or logs/places to check?

Cheers,
Gervais



> On Nov 24, 2021, at 4:03 AM, Staniforth, Paul 
>  wrote:
> 
> 
> Hi Gervais,
> 
>   The engine doesn't need to be able to ping the IP address, just 
> needs to know what it is so adding them to the /etc/hosts file should work.
> 
> Also, I would check ovirt1, is it mounting the brick, what does "systemctl 
> status glusterd" show, what are the logs in /var/log/gluster ?
> 
> 
> Regards,
> 
> Paul S.
> From: Gervais de Montbrun  <mailto:gerv...@demontbrun.com>>
> Sent: 24 November 2021 01:16
> To: Staniforth, Paul  <mailto:p.stanifo...@leedsbeckett.ac.uk>>
> Cc: Vojtech Juranek mailto:vjura...@redhat.com>>; 
> users@ovirt.org <mailto:users@ovirt.org>  <mailto:users@ovirt.org>>
> Subject: Re: [ovirt-users] How to debug "Non Operational" host
>  
> Caution External Mail: Do not click any links or open any attachments unless 
> you trust the sender and know that the content is safe.
> Hi Paul,
> 
> I don't quite get what you mean by this:
> 
>> assuming you have a storage network for the gluster nodes the engine needs 
>> to resolve be able to resolve the host addresses
> 
> 
> The storage network is on 10GB network cards and plugged into a stand-alone 
> switch. The hosted-engine is not on the same network at all and can not ping 
> the IP's associated with those cards. Are you saying that it needs access to 
> that network, or that is needs to be able to resolve the IP's. I can add them 
> to the /etc/hosts file on the ovirt-engine or do I need to reconfigure my 
> setup? It was working as it currently configured before applying the update.
> 
> I have no idea why the ovirt1 server is not showing up with the fqdn. I set 
> up all the servers the same way. It's been like that since I set things up. I 
> have looked for where this might be corrected, but can't find it. Ideas?
> 
> The yellow bricks... I can force start them (and I have in the past), but now 
> it turns green for a few minutes and then returns to red.
> 
> Cheers,
> Gervais
> 
> 
> 
>> On Nov 23, 2021, at 12:57 PM, Staniforth, Paul 
>> mailto:p.stanifo...@leedsbeckett.ac.uk>> 
>> wrote:
>> 
>> Hello Gervais,
>> 
>>is the brick mounted on ovirt1 ?  can you mount it using the 
>> settings in /etc/fstab ?
>> 
>> The hostname is not using a FQDN for ovirt1
>> 
>> assuming you have a storage network for the gluster nodes the engine needs 
>> to resolve be able to resolve the host addresses
>> ovirt1-storage.dgi
>> ovirt2-storage.dgi
>> ovirt3-storage.dgi
>> 
>> So that it can assign them to the correct network.
>> 
>> When the volume is showing yellow you can force restart them again from the 
>> GUI.
>> 
>> Regards,
>> 
>> Paul S.
>> From: Gervais de Montbrun > <mailto:gerv...@demontbrun.com>>
>> Sent: 23 November 2021 13:42
>> To: Vojtech Juranek mailto:vjura...@redhat.com>>
>> Cc: users@ovirt.org <mailto:users@ovirt.org> > <mailto:users@ovirt.org>>
>> Subject: [ovirt-users] Re: How to debug "Non Operational" host
>>  
>> Caution External Mail: Do not click any links or open any attachments unless 
>> you trust the sender and know that the content is safe.
>> 
>> Hi Vojta,
>> 
>> Thanks for the help.
>> 
>> I tried to activate my server this morning and captured the logs from 
>> vdsm.log and engine.log. They are attached.
>> 
>> Something went awry with my gluster (I think) as it is showing that the 
>> bricks on the affected server (ovirt1) are not mounted:
>> 
>> 
>> 
>> 
&g

[ovirt-users] Re: How to debug "Non Operational" host

2021-11-24 Thread Staniforth, Paul

Hi Gervais,

  The engine doesn't need to be able to ping the IP address, just 
needs to know what it is so adding them to the /etc/hosts file should work.

Also, I would check ovirt1, is it mounting the brick, what does "systemctl 
status glusterd" show, what are the logs in /var/log/gluster ?


Regards,

Paul S.

From: Gervais de Montbrun 
Sent: 24 November 2021 01:16
To: Staniforth, Paul 
Cc: Vojtech Juranek ; users@ovirt.org 
Subject: Re: [ovirt-users] How to debug "Non Operational" host


Caution External Mail: Do not click any links or open any attachments unless 
you trust the sender and know that the content is safe.

Hi Paul,

I don't quite get what you mean by this:

assuming you have a storage network for the gluster nodes the engine needs to 
resolve be able to resolve the host addresses

The storage network is on 10GB network cards and plugged into a stand-alone 
switch. The hosted-engine is not on the same network at all and can not ping 
the IP's associated with those cards. Are you saying that it needs access to 
that network, or that is needs to be able to resolve the IP's. I can add them 
to the /etc/hosts file on the ovirt-engine or do I need to reconfigure my 
setup? It was working as it currently configured before applying the update.

I have no idea why the ovirt1 server is not showing up with the fqdn. I set up 
all the servers the same way. It's been like that since I set things up. I have 
looked for where this might be corrected, but can't find it. Ideas?

The yellow bricks... I can force start them (and I have in the past), but now 
it turns green for a few minutes and then returns to red.

Cheers,
Gervais



On Nov 23, 2021, at 12:57 PM, Staniforth, Paul 
mailto:p.stanifo...@leedsbeckett.ac.uk>> wrote:

Hello Gervais,

   is the brick mounted on ovirt1 ?  can you mount it using the 
settings in /etc/fstab ?

The hostname is not using a FQDN for ovirt1

assuming you have a storage network for the gluster nodes the engine needs to 
resolve be able to resolve the host addresses

ovirt1-storage.dgi
ovirt2-storage.dgi
ovirt3-storage.dgi

So that it can assign them to the correct network.

When the volume is showing yellow you can force restart them again from the GUI.

Regards,

Paul S.


From: Gervais de Montbrun 
mailto:gerv...@demontbrun.com>>
Sent: 23 November 2021 13:42
To: Vojtech Juranek mailto:vjura...@redhat.com>>
Cc: users@ovirt.org<mailto:users@ovirt.org> 
mailto:users@ovirt.org>>
Subject: [ovirt-users] Re: How to debug "Non Operational" host

Caution External Mail: Do not click any links or open any attachments unless 
you trust the sender and know that the content is safe.

Hi Vojta,

Thanks for the help.

I tried to activate my server this morning and captured the logs from vdsm.log 
and engine.log. They are attached.

Something went awry with my gluster (I think) as it is showing that the bricks 
on the affected server (ovirt1) are not mounted:







The networking looks fine.

Cheers,
Gervais



> On Nov 23, 2021, at 3:37 AM, Vojtech Juranek 
> mailto:vjura...@redhat.com>> wrote:
>
> On Tuesday, 23 November 2021 03:36:07 CET Gervais de Montbrun wrote:
>> Hi Folks,
>>
>> I did a minor upgrade on the first host in my cluster and now it is
>> reporting "Non Operational"
>>
>> This is what yum showed as updatable. However, I did the update through the
>> ovirt-engine web interface.
>>
>> ovirt-node-ng-image-update.noarch
>>   4.4.9-1.el8
>>ovirt-4.4 Obsoleting Packages
>> ovirt-node-ng-image-update.noarch
>>   4.4.9-1.el8
>>ovirt-4.4 ovirt-node-ng-image-update.noarch
>>  4.4.8.3-1.el8
>>   @System ovirt-node-ng-image-update.noarch
>>   4.4.9-1.el8
>>ovirt-4.4
>> ovirt-node-ng-image-update-placeholder.noarch
>>4.4.8.3-1.el8
>> @System
>>
>> How do I start to debug this issue?
>
> Check engine log in /var/log/ovirt-engine/engine.log on the machine where
> engine runs
>
>>
>>
>> Also, it looks like the vmstore brick is not mounting on that host. I only
>> see the engine mounted.
>
>
> Could you also attach relevant part of vdsm log (/var/log/vdsm/vdsm.log) from
> the machine where mount failed? You should see some mount related error there.
> This could be also a reason why hosts become non-operational.
>
> Thanks
> Vojta
>
>> Broken server:
>> r...@ovirt1.dgi<mailto:r...@ovirt1.dgi> log]

[ovirt-users] Re: How to debug "Non Operational" host

2021-11-23 Thread Gervais de Montbrun
Hi Paul,

I don't quite get what you mean by this:

> assuming you have a storage network for the gluster nodes the engine needs to 
> resolve be able to resolve the host addresses


The storage network is on 10GB network cards and plugged into a stand-alone 
switch. The hosted-engine is not on the same network at all and can not ping 
the IP's associated with those cards. Are you saying that it needs access to 
that network, or that is needs to be able to resolve the IP's. I can add them 
to the /etc/hosts file on the ovirt-engine or do I need to reconfigure my 
setup? It was working as it currently configured before applying the update.

I have no idea why the ovirt1 server is not showing up with the fqdn. I set up 
all the servers the same way. It's been like that since I set things up. I have 
looked for where this might be corrected, but can't find it. Ideas?

The yellow bricks... I can force start them (and I have in the past), but now 
it turns green for a few minutes and then returns to red.

Cheers,
Gervais



> On Nov 23, 2021, at 12:57 PM, Staniforth, Paul 
>  wrote:
> 
> Hello Gervais,
> 
>is the brick mounted on ovirt1 ?  can you mount it using the 
> settings in /etc/fstab ?
> 
> The hostname is not using a FQDN for ovirt1
> 
> assuming you have a storage network for the gluster nodes the engine needs to 
> resolve be able to resolve the host addresses
> ovirt1-storage.dgi
> ovirt2-storage.dgi
> ovirt3-storage.dgi
> 
> So that it can assign them to the correct network.
> 
> When the volume is showing yellow you can force restart them again from the 
> GUI.
> 
> Regards,
> 
> Paul S.
> From: Gervais de Montbrun  <mailto:gerv...@demontbrun.com>>
> Sent: 23 November 2021 13:42
> To: Vojtech Juranek mailto:vjura...@redhat.com>>
> Cc: users@ovirt.org <mailto:users@ovirt.org>  <mailto:users@ovirt.org>>
> Subject: [ovirt-users] Re: How to debug "Non Operational" host
>  
> Caution External Mail: Do not click any links or open any attachments unless 
> you trust the sender and know that the content is safe.
> 
> Hi Vojta,
> 
> Thanks for the help.
> 
> I tried to activate my server this morning and captured the logs from 
> vdsm.log and engine.log. They are attached.
> 
> Something went awry with my gluster (I think) as it is showing that the 
> bricks on the affected server (ovirt1) are not mounted:
> 
> 
> 
> 
> 
> 
> 
> The networking looks fine.
> 
> Cheers,
> Gervais
> 
> 
> 
> > On Nov 23, 2021, at 3:37 AM, Vojtech Juranek  > <mailto:vjura...@redhat.com>> wrote:
> > 
> > On Tuesday, 23 November 2021 03:36:07 CET Gervais de Montbrun wrote:
> >> Hi Folks,
> >> 
> >> I did a minor upgrade on the first host in my cluster and now it is
> >> reporting "Non Operational"
> >> 
> >> This is what yum showed as updatable. However, I did the update through the
> >> ovirt-engine web interface.
> >> 
> >> ovirt-node-ng-image-update.noarch  
> >>  
> >>   4.4.9-1.el8 
> >>ovirt-4.4 Obsoleting Packages
> >> ovirt-node-ng-image-update.noarch  
> >>  
> >>   4.4.9-1.el8 
> >>ovirt-4.4 ovirt-node-ng-image-update.noarch
> >>  4.4.8.3-1.el8
> >>   @System ovirt-node-ng-image-update.noarch   
> >>   4.4.9-1.el8 
> >>ovirt-4.4
> >> ovirt-node-ng-image-update-placeholder.noarch  
> >>4.4.8.3-1.el8  
> >> @System
> >> 
> >> How do I start to debug this issue?
> > 
> > Check engine log in /var/log/ovirt-engine/engine.log on the machine where 
> > engine runs
> > 
> >> 
> >> 
> >> Also, it looks like the vmstore brick is not mounting on that host. I only
> >> see the engine mounted.
> > 
> > 
> > Could you also attach relevant part of vdsm log (/var/log/vdsm/vdsm.log) 
> > from 
> > the machine where mount failed? You should see some mount related error 
> > there. 
> > This could be also a reason why hosts become non-operational.
> > 
> > Thanks
> > Vojta
>

[ovirt-users] Re: How to debug "Non Operational" host

2021-11-23 Thread Staniforth, Paul
Hello Gervais,

   is the brick mounted on ovirt1 ?  can you mount it using the 
settings in /etc/fstab ?

The hostname is not using a FQDN for ovirt1

assuming you have a storage network for the gluster nodes the engine needs to 
resolve be able to resolve the host addresses

ovirt1-storage.dgi
ovirt2-storage.dgi
ovirt3-storage.dgi

So that it can assign them to the correct network.

When the volume is showing yellow you can force restart them again from the GUI.

Regards,

Paul S.


From: Gervais de Montbrun 
Sent: 23 November 2021 13:42
To: Vojtech Juranek 
Cc: users@ovirt.org 
Subject: [ovirt-users] Re: How to debug "Non Operational" host

Caution External Mail: Do not click any links or open any attachments unless 
you trust the sender and know that the content is safe.

Hi Vojta,

Thanks for the help.

I tried to activate my server this morning and captured the logs from vdsm.log 
and engine.log. They are attached.

Something went awry with my gluster (I think) as it is showing that the bricks 
on the affected server (ovirt1) are not mounted:
[cid:2a29d29c-b652-4af8-acf0-1270cb8864bc@eurprd03.prod.outlook.com]

[cid:b0dd6964-58c9-453f-8a6b-fdda6641bde7@eurprd03.prod.outlook.com]

[cid:fb25f398-906f-4a72-9927-b0fdf45e8a23@eurprd03.prod.outlook.com]


The networking looks fine.

Cheers,
Gervais



> On Nov 23, 2021, at 3:37 AM, Vojtech Juranek  wrote:
>
> On Tuesday, 23 November 2021 03:36:07 CET Gervais de Montbrun wrote:
>> Hi Folks,
>>
>> I did a minor upgrade on the first host in my cluster and now it is
>> reporting "Non Operational"
>>
>> This is what yum showed as updatable. However, I did the update through the
>> ovirt-engine web interface.
>>
>> ovirt-node-ng-image-update.noarch
>>   4.4.9-1.el8
>>ovirt-4.4 Obsoleting Packages
>> ovirt-node-ng-image-update.noarch
>>   4.4.9-1.el8
>>ovirt-4.4 ovirt-node-ng-image-update.noarch
>>  4.4.8.3-1.el8
>>   @System ovirt-node-ng-image-update.noarch
>>   4.4.9-1.el8
>>ovirt-4.4
>> ovirt-node-ng-image-update-placeholder.noarch
>>4.4.8.3-1.el8
>> @System
>>
>> How do I start to debug this issue?
>
> Check engine log in /var/log/ovirt-engine/engine.log on the machine where
> engine runs
>
>>
>>
>> Also, it looks like the vmstore brick is not mounting on that host. I only
>> see the engine mounted.
>
>
> Could you also attach relevant part of vdsm log (/var/log/vdsm/vdsm.log) from
> the machine where mount failed? You should see some mount related error there.
> This could be also a reason why hosts become non-operational.
>
> Thanks
> Vojta
>
>> Broken server:
>> r...@ovirt1.dgi log]# mount | grep storage
>> ovirt1-storage.dgi:/engine on
>> /rhev/data-center/mnt/glusterSD/ovirt1-storage.dgi:_engine type
>> fuse.glusterfs
>> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=
>> 131072) Working server:
>> [r...@ovirt2.dgi ~]# mount | grep storage
>> ovirt1-storage.dgi:/engine on
>> /rhev/data-center/mnt/glusterSD/ovirt1-storage.dgi:_engine type
>> fuse.glusterfs
>> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=
>> 131072) ovirt1-storage.dgi:/vmstore on
>> /rhev/data-center/mnt/glusterSD/ovirt1-storage.dgi:_vmstore type
>> fuse.glusterfs
>> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=
>> 131072)
>>
>>
>> I tried putting the server into maintenance mode and running a reinstall on
>> it. No change. I'de really appreciate some help sorting this our.
>>
>> Cheers,
>> Gervais
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: 
> https://www.ovirt.org/privacy-policy.html<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fprivacy-policy.html&data=04%7C01%7Cp.staniforth%40leedsbeckett.ac.uk%7Cdc2421c40bc24707ac7208d9ae8b9b18%7Cd79a81124fbe417aa112cd0fb490d85c%7C0%7C0%7C637732736906127847%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=h8Tr3INeL9M8Ta8rwdvA3IwAPrgzQtlTsE3e0VSO%2FHM%3D&reserved=0>
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.

[ovirt-users] Re: How to debug "Non Operational" host

2021-11-23 Thread Vojtech Juranek
On Tuesday, 23 November 2021 14:42:31 CET Gervais de Montbrun wrote:
> Hi Vojta,
> 
> Thanks for the help.
> 
> I tried to activate my server this morning and captured the logs from
> vdsm.log and engine.log. They are attached.
> 
> Something went awry with my gluster (I think) as it is showing that the
> bricks on the affected server (ovirt1) are not mounted:

It seems not to be available, therefore vdsm fails with "OSError: [Errno 116] 
Stale file handle" and therefore fails to mount it. I'd suggest to investigate 
what's happening with you Gluster storage, eventually try to mount it manually 
from affected machine - if you are able to mount it manually, vdsm should be 
able to mount it as well.

Given lots of warning in engine log "Could not associate brick 'ovirt1-
storage.dgi:/gluster_bricks/vmstore/vmstore' of volume '2670ff29-8d43-4610-
a437-c6ec2c235753' with correct network as no gluster network found in cluster 
'404c8d14-73c1-11eb-8755-00163e5907f6'", I'd probably first take a look on the 
network.

Vojta


signature.asc
Description: This is a digitally signed message part.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PZ5HTE5WOZYE4FGAUNI472IW7NE56M5V/


[ovirt-users] Re: How to debug "Non Operational" host

2021-11-22 Thread Vojtech Juranek
On Tuesday, 23 November 2021 03:36:07 CET Gervais de Montbrun wrote:
> Hi Folks,
> 
> I did a minor upgrade on the first host in my cluster and now it is
> reporting "Non Operational"
> 
> This is what yum showed as updatable. However, I did the update through the
> ovirt-engine web interface.
> 
> ovirt-node-ng-image-update.noarch   
>4.4.9-1.el8 
> ovirt-4.4 Obsoleting Packages
> ovirt-node-ng-image-update.noarch   
>4.4.9-1.el8 
> ovirt-4.4 ovirt-node-ng-image-update.noarch
>   4.4.8.3-1.el8
>@System ovirt-node-ng-image-update.noarch   
>4.4.9-1.el8 
> ovirt-4.4
> ovirt-node-ng-image-update-placeholder.noarch  
> 4.4.8.3-1.el8  
>  @System
> 
> How do I start to debug this issue?

Check engine log in /var/log/ovirt-engine/engine.log on the machine where 
engine runs

> 
> 
> Also, it looks like the vmstore brick is not mounting on that host. I only
> see the engine mounted.


Could you also attach relevant part of vdsm log (/var/log/vdsm/vdsm.log) from 
the machine where mount failed? You should see some mount related error there. 
This could be also a reason why hosts become non-operational.

Thanks
Vojta

> Broken server:
> r...@ovirt1.dgi log]# mount | grep storage
> ovirt1-storage.dgi:/engine on
> /rhev/data-center/mnt/glusterSD/ovirt1-storage.dgi:_engine type
> fuse.glusterfs
> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=
> 131072) Working server:
> [r...@ovirt2.dgi ~]# mount | grep storage
> ovirt1-storage.dgi:/engine on
> /rhev/data-center/mnt/glusterSD/ovirt1-storage.dgi:_engine type
> fuse.glusterfs
> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=
> 131072) ovirt1-storage.dgi:/vmstore on
> /rhev/data-center/mnt/glusterSD/ovirt1-storage.dgi:_vmstore type
> fuse.glusterfs
> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=
> 131072)
> 
> 
> I tried putting the server into maintenance mode and running a reinstall on
> it. No change. I'de really appreciate some help sorting this our.
> 
> Cheers,
> Gervais



signature.asc
Description: This is a digitally signed message part.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/S6C7R6LUTJXFMG7WIODA53VEU4O7ZNHJ/