[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-24 Thread Ritesh Chikatwar
On Wed, Sep 23, 2020 at 10:16 PM Strahil Nikolov via Users 
wrote:

>
> >1) ..."I would give the engine a 'Windows'-style fix (a.k.a.
> reboot)"  >how does one restart just the oVirt-engine?
> ssh to HostedEngine VM and run one of the following:
> - reboot
> - systemctl restart ovirt-engine.service
>
> >2) I now show in shell  3 nodes, each with the one brick for data,
> vmstore, >engine (and an ISO one I am trying to make).. with one brick each
> and all >online and replicating.  But the GUI shows thor (first server
> running >engine) offline needing to be reloaded.  Now volumes show two
> bricks.. one >online one offline.  And no option to start / force restart.
> If it shows one offline brick -> you can try the "force start". You can go
> to UI -> Storage -> Volume -> select Volume -> Start and then mark "Force"
> and "OK"
>
>
> >4) To the question of "did I add third node later."  I would attach
> >deployment guide I am building ... but can't do that in this forum.  but
> >this is as simple as I can make it.  3 intel generic servers,  1 x boot
> >drive , 1 x 512GB SSD,  2 x 1TB SSD in each.  wipe all data all
> >configuration fresh Centos8 minimal install.. setup SSH setup basic
> >networking... install cockpit.. run HCI wizard for all three nodes. That
> is >all.
>
> >How many hosts do you see in oVirt ?
> >Help is appreciated.  The main concern I have is gap in what engine sees
> >and what CLI shows.  Can someone show me where to get logs?  the GUI log
> >when I try to "activate" thor server "Status of host thor was set to
> >NonOperational."  "Gluster command [] failed on server
> >."  is very unhelpful.
> Check the following services on the node:
> - glusterd.service
>
If glusterd service is running on the host check whether vdsm-client is
returning gluster host uuid. just run this on the host "vdsm-client
--gluster-enable GlusterHost uuid".

> - sanlock.service
> - supervdsmd.service
> - vdsmd.service
> - ovirt-ha-broker.service
> - ovirt-ha-agent.service
>
> Best Regards,
> Strahil Nikolov
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/YJ7L5G7NU4PQAPQDCDIMC37JCEEGAILF/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7OKJXLUVA6G7B4NR6YVEVN3MNPMIJUXT/


[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-23 Thread Gobinda Das
We do have gluster volume UI sync issue and this is fixed in ovirt-4.4.2
BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1860775
Could be the same issue.

On Wed, Sep 23, 2020 at 10:16 PM Strahil Nikolov via Users 
wrote:

>
> >1) ..."I would give the engine a 'Windows'-style fix (a.k.a.
> reboot)"  >how does one restart just the oVirt-engine?
> ssh to HostedEngine VM and run one of the following:
> - reboot
> - systemctl restart ovirt-engine.service
>
> >2) I now show in shell  3 nodes, each with the one brick for data,
> vmstore, >engine (and an ISO one I am trying to make).. with one brick each
> and all >online and replicating.  But the GUI shows thor (first server
> running >engine) offline needing to be reloaded.  Now volumes show two
> bricks.. one >online one offline.  And no option to start / force restart.
> If it shows one offline brick -> you can try the "force start". You can go
> to UI -> Storage -> Volume -> select Volume -> Start and then mark "Force"
> and "OK"
>
>
> >4) To the question of "did I add third node later."  I would attach
> >deployment guide I am building ... but can't do that in this forum.  but
> >this is as simple as I can make it.  3 intel generic servers,  1 x boot
> >drive , 1 x 512GB SSD,  2 x 1TB SSD in each.  wipe all data all
> >configuration fresh Centos8 minimal install.. setup SSH setup basic
> >networking... install cockpit.. run HCI wizard for all three nodes. That
> is >all.
>
> >How many hosts do you see in oVirt ?
> >Help is appreciated.  The main concern I have is gap in what engine sees
> >and what CLI shows.  Can someone show me where to get logs?  the GUI log
> >when I try to "activate" thor server "Status of host thor was set to
> >NonOperational."  "Gluster command [] failed on server
> >."  is very unhelpful.
> Check the following services on the node:
> - glusterd.service
> - sanlock.service
> - supervdsmd.service
> - vdsmd.service
> - ovirt-ha-broker.service
> - ovirt-ha-agent.service
>
> Best Regards,
> Strahil Nikolov
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/YJ7L5G7NU4PQAPQDCDIMC37JCEEGAILF/
>


-- 


Thanks,
Gobinda
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BC3JYXVDAKANFLZQGMT6NUM6XBZNK254/


[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-23 Thread Strahil Nikolov via Users

>1) ..."I would give the engine a 'Windows'-style fix (a.k.a. reboot)"  
>>how does one restart just the oVirt-engine?
ssh to HostedEngine VM and run one of the following:
- reboot
- systemctl restart ovirt-engine.service

>2) I now show in shell  3 nodes, each with the one brick for data, vmstore, 
>>engine (and an ISO one I am trying to make).. with one brick each and all 
>>online and replicating.  But the GUI shows thor (first server running 
>>engine) offline needing to be reloaded.  Now volumes show two bricks.. one 
>>online one offline.  And no option to start / force restart.
If it shows one offline brick -> you can try the "force start". You can go to 
UI -> Storage -> Volume -> select Volume -> Start and then mark "Force" and "OK"


>4) To the question of "did I add third node later."  I would attach 
>>deployment guide I am building ... but can't do that in this forum.  but 
>>this is as simple as I can make it.  3 intel generic servers,  1 x boot 
>>drive , 1 x 512GB SSD,  2 x 1TB SSD in each.  wipe all data all 
>>configuration fresh Centos8 minimal install.. setup SSH setup basic 
>>networking... install cockpit.. run HCI wizard for all three nodes. That is 
>>all.

>How many hosts do you see in oVirt ?
>Help is appreciated.  The main concern I have is gap in what engine sees >and 
>what CLI shows.  Can someone show me where to get logs?  the GUI log  >when I 
>try to "activate" thor server "Status of host thor was set to 
>>NonOperational."  "Gluster command [] failed on server >."  
>is very unhelpful.
Check the following services on the node:
- glusterd.service
- sanlock.service
- supervdsmd.service
- vdsmd.service
- ovirt-ha-broker.service
- ovirt-ha-agent.service

Best Regards,
Strahil Nikolov
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YJ7L5G7NU4PQAPQDCDIMC37JCEEGAILF/


[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-23 Thread Jeremey Wise
in oVirt Engine I think I see some of the issue

When you go under volumes -> Data ->

[image: image.png]

It notes two servers..  when you choose "add brick" it says volume has 3
bricks but only two servers.

So I went back to my deployment notes and walked through setup

yum install https://resources.ovirt.org/pub/yum-repo/ovirt-release44.rpm -y

yum install -y cockpit-ovirt-dashboard vdsm-gluster ovirt-host

Last metadata expiration check: 1:59:46 ago on Wed 23 Sep 2020 06:10:46 AM
EDT.
Package cockpit-ovirt-dashboard-0.14.11-1.el8.noarch is already installed.
Package ovirt-host-4.4.1-4.el8.x86_64 is already installed.
Dependencies resolved.
=
 Package
 ArchitectureVersion
  Repository
   Size
=
Installing:
 vdsm-gluster   x86_64
 4.40.26.3-1.el8
  ovirt-4.4
  67 k
Installing dependencies:
 blivet-datanoarch
 1:3.1.0-21.el8_2
 AppStream
 238 k
 glusterfs-events   x86_64
 7.7-1.el8
  ovirt-4.4-centos-gluster7
  65 k
 glusterfs-geo-replication  x86_64
 7.7-1.el8
  ovirt-4.4-centos-gluster7
 212 k
 libblockdev-plugins-allx86_64
 2.19-12.el8
  AppStream
  62 k
 libblockdev-vdox86_64
 2.19-12.el8
  AppStream
  74 k
 python3-blivet noarch
 1:3.1.0-21.el8_2
 AppStream
 995 k
 python3-blockdev   x86_64
 2.19-12.el8
  AppStream
  79 k
 python3-bytesize   x86_64
 1.4-3.el8
  AppStream
  28 k
 python3-magic  noarch
 5.33-13.el8
  BaseOS
   45 k
 python3-pyparted   x86_64
 1:3.11.0-13.el8
  AppStream
 123 k


Dependencies resolved.
Nothing to do.
Complete!
[root@thor media]#



AKA.. something got removed from the node..


Rebooted.. as I am not sure which dependancies and services would need to
be restarted to get oVirt-engine to pick things up.


Host is now "green" .. now only errors are about gluster bricks..



On Tue, Sep 22, 2020 at 9:30 PM penguin pages 
wrote:

>
>
> eMail client with this forum is a bit .. I was told this web
> interface I could post images... as embedded ones in email get scraped
> out...  but not seeing how that is done. Seems to be txt only.
>
>
>
> 1) ..."I would give the engine a 'Windows'-style fix (a.k.a. reboot)"
> how does one restart just the oVirt-engine?
>
> 2) I now show in shell  3 nodes, each with the one brick for data,
> vmstore, engine (and an ISO one I am trying to make).. with one brick each
> and all online and replicating.   But the GUI shows thor (first server
> running engine) offline needing to be reloaded.  Now volumes show two
> bricks.. one online one offline.  And no option to start / force restart.
>
> 3) I have tried several times to try a graceful reboot to see if startup
> sequence was issue.   I tore down VLANs and bridges to make it flat 1 x 1Gb
> mgmt, 1 x 10Gb storage.   SSH between nodes is fine... copy test was
> great.   I don't think it is nodes.
>
> 4) To the question of "did I add third node later."  I would attach
> deployment guide I am building ... but can't do that in this forum.  but
> this is as simple as I can make it.  3 intel generic servers,  1 x boot
> drive , 1 x 512GB SSD,  2 x 1TB SSD in each.   wipe all data all
> configuration fresh Centos8 minimal install.. setup SSH setup basic
> networking... install cockpit.. run HCI wizard for all three nodes. That is
> all.
>
> Trying to learn and support concept of oVirt as a viable platform but
> still trying to work through learning how to root cause, kick tires, and
> debug / recover when things go down .. as they will.
>
> Help is appreciated.  The main concern I have is gap in what engine sees
> 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread penguin pages


eMail client with this forum is a bit .. I was told this web interface 
I could post images... as embedded ones in email get scraped out...  but not 
seeing how that is done. Seems to be txt only.



1) ..."I would give the engine a 'Windows'-style fix (a.k.a. reboot)"  how 
does one restart just the oVirt-engine?

2) I now show in shell  3 nodes, each with the one brick for data, vmstore, 
engine (and an ISO one I am trying to make).. with one brick each and all 
online and replicating.   But the GUI shows thor (first server running engine) 
offline needing to be reloaded.  Now volumes show two bricks.. one online one 
offline.  And no option to start / force restart.

3) I have tried several times to try a graceful reboot to see if startup 
sequence was issue.   I tore down VLANs and bridges to make it flat 1 x 1Gb 
mgmt, 1 x 10Gb storage.   SSH between nodes is fine... copy test was great.   I 
don't think it is nodes.

4) To the question of "did I add third node later."  I would attach deployment 
guide I am building ... but can't do that in this forum.  but this is as simple 
as I can make it.  3 intel generic servers,  1 x boot drive , 1 x 512GB SSD,  2 
x 1TB SSD in each.   wipe all data all configuration fresh Centos8 minimal 
install.. setup SSH setup basic networking... install cockpit.. run HCI wizard 
for all three nodes. That is all.

Trying to learn and support concept of oVirt as a viable platform but still 
trying to work through learning how to root cause, kick tires, and debug / 
recover when things go down .. as they will.

Help is appreciated.  The main concern I have is gap in what engine sees and 
what CLI shows.  Can someone show me where to get logs?  the GUI log  when I 
try to "activate" thor server "Status of host thor was set to NonOperational."  
"Gluster command [] failed on server ."   is very unhelpful.

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LKD7LJMC4X3LG5SEZ2M64YN5UKX36RAS/


[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
Ovirt uses the "/rhev/mnt... mountpoints.

Do you have those (for each storage domain ) ?

Here is an example from one of my nodes:
[root@ovirt1 ~]# df -hT | grep rhev
gluster1:/engine                              fuse.glusterfs  100G   19G   82G  
19% /rhev/data-center/mnt/glusterSD/gluster1:_engine
gluster1:/fast4                               fuse.glusterfs  100G   53G   48G  
53% /rhev/data-center/mnt/glusterSD/gluster1:_fast4
gluster1:/fast1                               fuse.glusterfs  100G   56G   45G  
56% /rhev/data-center/mnt/glusterSD/gluster1:_fast1
gluster1:/fast2                               fuse.glusterfs  100G   56G   45G  
56% /rhev/data-center/mnt/glusterSD/gluster1:_fast2
gluster1:/fast3                               fuse.glusterfs  100G   55G   46G  
55% /rhev/data-center/mnt/glusterSD/gluster1:_fast3
gluster1:/data                                fuse.glusterfs  2.4T  535G  1.9T  
23% /rhev/data-center/mnt/glusterSD/gluster1:_data



Best Regards,
Strahil Nikolov


В вторник, 22 септември 2020 г., 19:44:54 Гринуич+3, Jeremey Wise 
 написа: 






Yes.

And at one time it was fine.   I did a graceful shutdown.. and after booting it 
always seems to now have issue with the one server... of course the one hosting 
the ovirt-engine :P

# Three nodes in cluster

# Error when you hover over node


# when i select node and choose "activate"



#Gluster is working fine... this is oVirt who is confused.
[root@medusa vmstore]# mount |grep media/vmstore
medusast.penguinpages.local:/vmstore on /media/vmstore type fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev)
[root@medusa vmstore]# echo > /media/vmstore/test.out
[root@medusa vmstore]# ssh -f thor 'echo $HOSTNAME >> /media/vmstore/test.out'
[root@medusa vmstore]# ssh -f odin 'echo $HOSTNAME >> /media/vmstore/test.out'
[root@medusa vmstore]# ssh -f medusa 'echo $HOSTNAME >> /media/vmstore/test.out'
[root@medusa vmstore]# cat /media/vmstore/test.out

thor.penguinpages.local
odin.penguinpages.local
medusa.penguinpages.local


Ideas to fix oVirt?



On Tue, Sep 22, 2020 at 10:42 AM Strahil Nikolov  wrote:
> By the way, did you add the third host in the oVirt ?
> 
> If not , maybe that is the real problem :)
> 
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В вторник, 22 септември 2020 г., 17:23:28 Гринуич+3, Jeremey Wise 
>  написа: 
> 
> 
> 
> 
> 
> Its like oVirt thinks there are only two nodes in gluster replication
> 
> 
> 
> 
> 
> # Yet it is clear the CLI shows three bricks.
> [root@medusa vms]# gluster volume status vmstore
> Status of volume: vmstore
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> --
> Brick thorst.penguinpages.local:/gluster_br
> icks/vmstore/vmstore                        49154     0          Y       9444
> Brick odinst.penguinpages.local:/gluster_br
> icks/vmstore/vmstore                        49154     0          Y       3269
> Brick medusast.penguinpages.local:/gluster_
> bricks/vmstore/vmstore                      49154     0          Y       7841
> Self-heal Daemon on localhost               N/A       N/A        Y       80152
> Self-heal Daemon on odinst.penguinpages.loc
> al                                          N/A       N/A        Y       
> 141750
> Self-heal Daemon on thorst.penguinpages.loc
> al                                          N/A       N/A        Y       
> 245870
> 
> Task Status of Volume vmstore
> --
> There are no active volume tasks
> 
> 
> 
> How do I get oVirt to re-establish reality to what Gluster sees?
> 
> 
> 
> On Tue, Sep 22, 2020 at 8:59 AM Strahil Nikolov  wrote:
>> Also in some rare cases, I have seen oVirt showing gluster as 2 out of 3 
>> bricks up , but usually it was an UI issue and you go to UI and mark a 
>> "force start" which will try to start any bricks that were down (won't 
>> affect gluster) and will wake up the UI task to verify again brick status.
>> 
>> 
>> https://github.com/gluster/gstatus is a good one to verify your cluster 
>> health , yet human's touch is priceless in any kind of technology.
>> 
>> Best Regards,
>> Strahil Nikolov
>> 
>> 
>> 
>> 
>> 
>> 
>> В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise 
>>  написа: 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> when I posted last..  in the tread I paste a roling restart.    And...  now 
>> it is replicating.
>> 
>> oVirt still showing wrong.  BUT..   I did my normal test from each of the 
>> three nodes.
>> 
>> 1) Mount Gluster file system with localhost as primary and other two as 
>> tertiary to local mount (like a client would do)
>> 2) run test file create Ex:   echo $HOSTNAME >> /media/glustervolume/test.out
>> 3) repeat from each node then read back that all are in sync.
>> 
>> I REALLY hate reboot (restart) as a fix.  I need to get better with root 
>> 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
By the way, did you add the third host in the oVirt ?

If not , maybe that is the real problem :)


Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 17:23:28 Гринуич+3, Jeremey Wise 
 написа: 





Its like oVirt thinks there are only two nodes in gluster replication





# Yet it is clear the CLI shows three bricks.
[root@medusa vms]# gluster volume status vmstore
Status of volume: vmstore
Gluster process                             TCP Port  RDMA Port  Online  Pid
--
Brick thorst.penguinpages.local:/gluster_br
icks/vmstore/vmstore                        49154     0          Y       9444
Brick odinst.penguinpages.local:/gluster_br
icks/vmstore/vmstore                        49154     0          Y       3269
Brick medusast.penguinpages.local:/gluster_
bricks/vmstore/vmstore                      49154     0          Y       7841
Self-heal Daemon on localhost               N/A       N/A        Y       80152
Self-heal Daemon on odinst.penguinpages.loc
al                                          N/A       N/A        Y       141750
Self-heal Daemon on thorst.penguinpages.loc
al                                          N/A       N/A        Y       245870

Task Status of Volume vmstore
--
There are no active volume tasks



How do I get oVirt to re-establish reality to what Gluster sees?



On Tue, Sep 22, 2020 at 8:59 AM Strahil Nikolov  wrote:
> Also in some rare cases, I have seen oVirt showing gluster as 2 out of 3 
> bricks up , but usually it was an UI issue and you go to UI and mark a "force 
> start" which will try to start any bricks that were down (won't affect 
> gluster) and will wake up the UI task to verify again brick status.
> 
> 
> https://github.com/gluster/gstatus is a good one to verify your cluster 
> health , yet human's touch is priceless in any kind of technology.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise 
>  написа: 
> 
> 
> 
> 
> 
> 
> 
> when I posted last..  in the tread I paste a roling restart.    And...  now 
> it is replicating.
> 
> oVirt still showing wrong.  BUT..   I did my normal test from each of the 
> three nodes.
> 
> 1) Mount Gluster file system with localhost as primary and other two as 
> tertiary to local mount (like a client would do)
> 2) run test file create Ex:   echo $HOSTNAME >> /media/glustervolume/test.out
> 3) repeat from each node then read back that all are in sync.
> 
> I REALLY hate reboot (restart) as a fix.  I need to get better with root 
> cause of gluster issues if I am going to trust it.  Before when I manually 
> made the volumes and it was simply (vdo + gluster) then worst case was that 
> gluster would break... but I could always go into "brick" path and copy data 
> out.
> 
> Now with oVirt.. .and LVM and thin provisioning etc..   I am abstracted from 
> simple file recovery..  Without GLUSTER AND oVirt Engine up... all my 
> environment  and data is lost.  This means nodes moved more to "pets" then 
> cattle.
> 
> And with three nodes.. I can't afford to loose any pets. 
> 
> I will post more when I get cluster settled and work on those wierd notes 
> about quorum volumes noted on two nodes when glusterd is restarted.
> 
> Thanks,
> 
> On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov  wrote:
>> Replication issue could mean that one of the client (FUSE mounts) is not 
>> attached to all bricks.
>> 
>> You can check the amount of clients via:
>> gluster volume status all client-list
>> 
>> 
>> As a prevention , just do a rolling restart:
>> - set a host in maintenance and mark it to stop glusterd service (I'm 
>> reffering to the UI)
>> - Activate the host , once it was moved to maintenance
>> 
>> Wait for the host's HE score to recover (silver/gold crown in UI) and then 
>> proceed with the next one.
>> 
>> Best Regards,
>> Strahil Nikolov
>> 
>> 
>> 
>> 
>> В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise 
>>  написа: 
>> 
>> 
>> 
>> 
>> 
>> 
>> I did.
>> 
>> Here are all three nodes with restart. I find it odd ... their has been a 
>> set of messages at end (see below) which I don't know enough about what 
>> oVirt laid out to know if it is bad.
>> 
>> ###
>> [root@thor vmstore]# systemctl status glusterd
>> ● glusterd.service - GlusterFS, a clustered file-system server
>>    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor 
>> preset: disabled)
>>   Drop-In: /etc/systemd/system/glusterd.service.d
>>            └─99-cpu.conf
>>    Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
>>      Docs: man:glusterd(8)
>>   Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid 
>> --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
>>  Main PID: 2113 (glusterd)
>>     Tasks: 151 (limit: 1235410)
>>    Memory: 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
That's really wierd.
I would give the engine a 'Windows'-style fix (a.k.a. reboot).

I guess some of the engine's internal processes crashed/looped and it doesn't 
see the reality.

Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 16:27:25 Гринуич+3, Jeremey Wise 
 написа: 





Its like oVirt thinks there are only two nodes in gluster replication





# Yet it is clear the CLI shows three bricks.
[root@medusa vms]# gluster volume status vmstore
Status of volume: vmstore
Gluster process                             TCP Port  RDMA Port  Online  Pid
--
Brick thorst.penguinpages.local:/gluster_br
icks/vmstore/vmstore                        49154     0          Y       9444
Brick odinst.penguinpages.local:/gluster_br
icks/vmstore/vmstore                        49154     0          Y       3269
Brick medusast.penguinpages.local:/gluster_
bricks/vmstore/vmstore                      49154     0          Y       7841
Self-heal Daemon on localhost               N/A       N/A        Y       80152
Self-heal Daemon on odinst.penguinpages.loc
al                                          N/A       N/A        Y       141750
Self-heal Daemon on thorst.penguinpages.loc
al                                          N/A       N/A        Y       245870

Task Status of Volume vmstore
--
There are no active volume tasks



How do I get oVirt to re-establish reality to what Gluster sees?



On Tue, Sep 22, 2020 at 8:59 AM Strahil Nikolov  wrote:
> Also in some rare cases, I have seen oVirt showing gluster as 2 out of 3 
> bricks up , but usually it was an UI issue and you go to UI and mark a "force 
> start" which will try to start any bricks that were down (won't affect 
> gluster) and will wake up the UI task to verify again brick status.
> 
> 
> https://github.com/gluster/gstatus is a good one to verify your cluster 
> health , yet human's touch is priceless in any kind of technology.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise 
>  написа: 
> 
> 
> 
> 
> 
> 
> 
> when I posted last..  in the tread I paste a roling restart.    And...  now 
> it is replicating.
> 
> oVirt still showing wrong.  BUT..   I did my normal test from each of the 
> three nodes.
> 
> 1) Mount Gluster file system with localhost as primary and other two as 
> tertiary to local mount (like a client would do)
> 2) run test file create Ex:   echo $HOSTNAME >> /media/glustervolume/test.out
> 3) repeat from each node then read back that all are in sync.
> 
> I REALLY hate reboot (restart) as a fix.  I need to get better with root 
> cause of gluster issues if I am going to trust it.  Before when I manually 
> made the volumes and it was simply (vdo + gluster) then worst case was that 
> gluster would break... but I could always go into "brick" path and copy data 
> out.
> 
> Now with oVirt.. .and LVM and thin provisioning etc..   I am abstracted from 
> simple file recovery..  Without GLUSTER AND oVirt Engine up... all my 
> environment  and data is lost.  This means nodes moved more to "pets" then 
> cattle.
> 
> And with three nodes.. I can't afford to loose any pets. 
> 
> I will post more when I get cluster settled and work on those wierd notes 
> about quorum volumes noted on two nodes when glusterd is restarted.
> 
> Thanks,
> 
> On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov  wrote:
>> Replication issue could mean that one of the client (FUSE mounts) is not 
>> attached to all bricks.
>> 
>> You can check the amount of clients via:
>> gluster volume status all client-list
>> 
>> 
>> As a prevention , just do a rolling restart:
>> - set a host in maintenance and mark it to stop glusterd service (I'm 
>> reffering to the UI)
>> - Activate the host , once it was moved to maintenance
>> 
>> Wait for the host's HE score to recover (silver/gold crown in UI) and then 
>> proceed with the next one.
>> 
>> Best Regards,
>> Strahil Nikolov
>> 
>> 
>> 
>> 
>> В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise 
>>  написа: 
>> 
>> 
>> 
>> 
>> 
>> 
>> I did.
>> 
>> Here are all three nodes with restart. I find it odd ... their has been a 
>> set of messages at end (see below) which I don't know enough about what 
>> oVirt laid out to know if it is bad.
>> 
>> ###
>> [root@thor vmstore]# systemctl status glusterd
>> ● glusterd.service - GlusterFS, a clustered file-system server
>>    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor 
>> preset: disabled)
>>   Drop-In: /etc/systemd/system/glusterd.service.d
>>            └─99-cpu.conf
>>    Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
>>      Docs: man:glusterd(8)
>>   Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid 
>> --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
Also in some rare cases, I have seen oVirt showing gluster as 2 out of 3 bricks 
up , but usually it was an UI issue and you go to UI and mark a "force start" 
which will try to start any bricks that were down (won't affect gluster) and 
will wake up the UI task to verify again brick status.


https://github.com/gluster/gstatus is a good one to verify your cluster health 
, yet human's touch is priceless in any kind of technology.

Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise 
 написа: 







when I posted last..  in the tread I paste a roling restart.    And...  now it 
is replicating.

oVirt still showing wrong.  BUT..   I did my normal test from each of the three 
nodes.

1) Mount Gluster file system with localhost as primary and other two as 
tertiary to local mount (like a client would do)
2) run test file create Ex:   echo $HOSTNAME >> /media/glustervolume/test.out
3) repeat from each node then read back that all are in sync.

I REALLY hate reboot (restart) as a fix.  I need to get better with root cause 
of gluster issues if I am going to trust it.  Before when I manually made the 
volumes and it was simply (vdo + gluster) then worst case was that gluster 
would break... but I could always go into "brick" path and copy data out.

Now with oVirt.. .and LVM and thin provisioning etc..   I am abstracted from 
simple file recovery..  Without GLUSTER AND oVirt Engine up... all my 
environment  and data is lost.  This means nodes moved more to "pets" then 
cattle.

And with three nodes.. I can't afford to loose any pets. 

I will post more when I get cluster settled and work on those wierd notes about 
quorum volumes noted on two nodes when glusterd is restarted.

Thanks,

On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov  wrote:
> Replication issue could mean that one of the client (FUSE mounts) is not 
> attached to all bricks.
> 
> You can check the amount of clients via:
> gluster volume status all client-list
> 
> 
> As a prevention , just do a rolling restart:
> - set a host in maintenance and mark it to stop glusterd service (I'm 
> reffering to the UI)
> - Activate the host , once it was moved to maintenance
> 
> Wait for the host's HE score to recover (silver/gold crown in UI) and then 
> proceed with the next one.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise 
>  написа: 
> 
> 
> 
> 
> 
> 
> I did.
> 
> Here are all three nodes with restart. I find it odd ... their has been a set 
> of messages at end (see below) which I don't know enough about what oVirt 
> laid out to know if it is bad.
> 
> ###
> [root@thor vmstore]# systemctl status glusterd
> ● glusterd.service - GlusterFS, a clustered file-system server
>    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor 
> preset: disabled)
>   Drop-In: /etc/systemd/system/glusterd.service.d
>            └─99-cpu.conf
>    Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
>      Docs: man:glusterd(8)
>   Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid 
> --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
>  Main PID: 2113 (glusterd)
>     Tasks: 151 (limit: 1235410)
>    Memory: 3.8G
>       CPU: 6min 46.050s
>    CGroup: /glusterfs.slice/glusterd.service
>            ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level 
> INFO
>            ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p 
> /var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log 
> -S /var/run/gluster/2f41374c2e36bf4d.socket --xlator-option 
> *replicate*.node-uu>
>            ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p 
> /var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid
>  -S /var/r>
>            ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine -p 
> /var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p>
>            ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore 
> -p 
> /var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms>
>            └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p 
> /var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid
>  -S /var/run/glu>
> 
> Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a 
> clustered file-system server...
> Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a 
> clustered file-system server.
> Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 
> 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
Usually I first start with:
'gluster volume heal  info summary'

Anything that is not 'Connected' is bad.

Yeah, the abstraction is not so nice, but the good thing is that you can always 
extract the data from a single node left (it will require to play a little bit 
with the quorum of the volume).

Usually I have seen that the FUSE fails to reconnect to a "gone bad and 
recovered" brick and then you got that endless healing (as FUSE will write the 
data to only 2 out of 3 bricks and then a heal is pending :D ).

I would go with the gluster logs and the brick logs and then you can dig deeper 
if you suspect network issue.


Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise 
 написа: 







when I posted last..  in the tread I paste a roling restart.    And...  now it 
is replicating.

oVirt still showing wrong.  BUT..   I did my normal test from each of the three 
nodes.

1) Mount Gluster file system with localhost as primary and other two as 
tertiary to local mount (like a client would do)
2) run test file create Ex:   echo $HOSTNAME >> /media/glustervolume/test.out
3) repeat from each node then read back that all are in sync.

I REALLY hate reboot (restart) as a fix.  I need to get better with root cause 
of gluster issues if I am going to trust it.  Before when I manually made the 
volumes and it was simply (vdo + gluster) then worst case was that gluster 
would break... but I could always go into "brick" path and copy data out.

Now with oVirt.. .and LVM and thin provisioning etc..   I am abstracted from 
simple file recovery..  Without GLUSTER AND oVirt Engine up... all my 
environment  and data is lost.  This means nodes moved more to "pets" then 
cattle.

And with three nodes.. I can't afford to loose any pets. 

I will post more when I get cluster settled and work on those wierd notes about 
quorum volumes noted on two nodes when glusterd is restarted.

Thanks,

On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov  wrote:
> Replication issue could mean that one of the client (FUSE mounts) is not 
> attached to all bricks.
> 
> You can check the amount of clients via:
> gluster volume status all client-list
> 
> 
> As a prevention , just do a rolling restart:
> - set a host in maintenance and mark it to stop glusterd service (I'm 
> reffering to the UI)
> - Activate the host , once it was moved to maintenance
> 
> Wait for the host's HE score to recover (silver/gold crown in UI) and then 
> proceed with the next one.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise 
>  написа: 
> 
> 
> 
> 
> 
> 
> I did.
> 
> Here are all three nodes with restart. I find it odd ... their has been a set 
> of messages at end (see below) which I don't know enough about what oVirt 
> laid out to know if it is bad.
> 
> ###
> [root@thor vmstore]# systemctl status glusterd
> ● glusterd.service - GlusterFS, a clustered file-system server
>    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor 
> preset: disabled)
>   Drop-In: /etc/systemd/system/glusterd.service.d
>            └─99-cpu.conf
>    Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
>      Docs: man:glusterd(8)
>   Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid 
> --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
>  Main PID: 2113 (glusterd)
>     Tasks: 151 (limit: 1235410)
>    Memory: 3.8G
>       CPU: 6min 46.050s
>    CGroup: /glusterfs.slice/glusterd.service
>            ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level 
> INFO
>            ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p 
> /var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log 
> -S /var/run/gluster/2f41374c2e36bf4d.socket --xlator-option 
> *replicate*.node-uu>
>            ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p 
> /var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid
>  -S /var/r>
>            ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine -p 
> /var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p>
>            ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore 
> -p 
> /var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms>
>            └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p 
> /var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid
>  -S /var/run/glu>
> 
> Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a 
> clustered file-system server...

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Jeremey Wise
when I posted last..  in the tread I paste a roling restart.And...  now
it is replicating.

oVirt still showing wrong.  BUT..   I did my normal test from each of the
three nodes.

1) Mount Gluster file system with localhost as primary and other two as
tertiary to local mount (like a client would do)
2) run test file create Ex:   echo $HOSTNAME >>
/media/glustervolume/test.out
3) repeat from each node then read back that all are in sync.

I REALLY hate reboot (restart) as a fix.  I need to get better with root
cause of gluster issues if I am going to trust it.  Before when I manually
made the volumes and it was simply (vdo + gluster) then worst case was that
gluster would break... but I could always go into "brick" path and copy
data out.

Now with oVirt.. .and LVM and thin provisioning etc..   I am abstracted
from simple file recovery..  Without GLUSTER AND oVirt Engine up... all my
environment  and data is lost.  This means nodes moved more to "pets" then
cattle.

And with three nodes.. I can't afford to loose any pets.

I will post more when I get cluster settled and work on those wierd notes
about quorum volumes noted on two nodes when glusterd is restarted.

Thanks,

On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov 
wrote:

> Replication issue could mean that one of the client (FUSE mounts) is not
> attached to all bricks.
>
> You can check the amount of clients via:
> gluster volume status all client-list
>
>
> As a prevention , just do a rolling restart:
> - set a host in maintenance and mark it to stop glusterd service (I'm
> reffering to the UI)
> - Activate the host , once it was moved to maintenance
>
> Wait for the host's HE score to recover (silver/gold crown in UI) and then
> proceed with the next one.
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
> В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise <
> jeremey.w...@gmail.com> написа:
>
>
>
>
>
>
> I did.
>
> Here are all three nodes with restart. I find it odd ... their has been a
> set of messages at end (see below) which I don't know enough about what
> oVirt laid out to know if it is bad.
>
> ###
> [root@thor vmstore]# systemctl status glusterd
> ● glusterd.service - GlusterFS, a clustered file-system server
>Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled;
> vendor preset: disabled)
>   Drop-In: /etc/systemd/system/glusterd.service.d
>└─99-cpu.conf
>Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
>  Docs: man:glusterd(8)
>   Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid
> --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
>  Main PID: 2113 (glusterd)
> Tasks: 151 (limit: 1235410)
>Memory: 3.8G
>   CPU: 6min 46.050s
>CGroup: /glusterfs.slice/glusterd.service
>├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level
> INFO
>├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data
> -p /var/run/gluster/shd/data/data-shd.pid -l
> /var/log/glusterfs/glustershd.log -S
> /var/run/gluster/2f41374c2e36bf4d.socket --xlator-option
> *replicate*.node-uu>
>├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local
> --volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p
> /var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid
> -S /var/r>
>├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local
> --volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine
> -p
> /var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p>
>├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local
> --volfile-id
> vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore -p
> /var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms>
>└─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local
> --volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p
> /var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid
> -S /var/run/glu>
>
> Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a
> clustered file-system server...
> Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a
> clustered file-system server.
> Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22
> 00:32:28.605674] C [MSGID: 106003]
> [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume data. Starting lo>
> Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22
> 00:32:28.639490] C [MSGID: 106003]
> [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume engine. Starting >
> Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22
> 00:32:28.680665] C [MSGID: 106003]
> [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action]
> 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
At around Sep 21 20:33 local time , you got  a loss of quorum - that's not good.

Could it be a network 'hicup' ?

Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 15:05:16 Гринуич+3, Jeremey Wise 
 написа: 






I did.

Here are all three nodes with restart. I find it odd ... their has been a set 
of messages at end (see below) which I don't know enough about what oVirt laid 
out to know if it is bad.

###
[root@thor vmstore]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor 
preset: disabled)
  Drop-In: /etc/systemd/system/glusterd.service.d
           └─99-cpu.conf
   Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
     Docs: man:glusterd(8)
  Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid 
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 2113 (glusterd)
    Tasks: 151 (limit: 1235410)
   Memory: 3.8G
      CPU: 6min 46.050s
   CGroup: /glusterfs.slice/glusterd.service
           ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
           ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p 
/var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log -S 
/var/run/gluster/2f41374c2e36bf4d.socket --xlator-option *replicate*.node-uu>
           ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p 
/var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid
 -S /var/r>
           ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine -p 
/var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p>
           ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore 
-p 
/var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms>
           └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p 
/var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid 
-S /var/run/glu>

Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a 
clustered file-system server...
Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a 
clustered file-system server.
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 
00:32:28.605674] C [MSGID: 106003] 
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: 
Server quorum regained for volume data. Starting lo>
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 
00:32:28.639490] C [MSGID: 106003] 
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: 
Server quorum regained for volume engine. Starting >
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 
00:32:28.680665] C [MSGID: 106003] 
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: 
Server quorum regained for volume vmstore. Starting>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:24.813409] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
0-data-client-0: server 172.16.101.101:24007 has not responded in the last 30 
seconds, discon>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:24.815147] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
2-engine-client-0: server 172.16.101.101:24007 has not responded in the last 30 
seconds, disc>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:24.818735] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
4-vmstore-client-0: server 172.16.101.101:24007 has not responded in the last 
30 seconds, dis>
Sep 21 20:33:36 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:36.816978] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
3-iso-client-0: server 172.16.101.101:24007 has not responded in the last 42 
seconds, disconn>
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]# systemctl restart glusterd
[root@thor vmstore]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor 
preset: disabled)
  Drop-In: /etc/systemd/system/glusterd.service.d
           └─99-cpu.conf
   Active: active (running) since Tue 2020-09-22 07:24:34 EDT; 2s ago
     Docs: man:glusterd(8)
  Process: 245831 ExecStart=/usr/sbin/glusterd -p 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
Replication issue could mean that one of the client (FUSE mounts) is not 
attached to all bricks.

You can check the amount of clients via:
gluster volume status all client-list


As a prevention , just do a rolling restart:
- set a host in maintenance and mark it to stop glusterd service (I'm reffering 
to the UI)
- Activate the host , once it was moved to maintenance

Wait for the host's HE score to recover (silver/gold crown in UI) and then 
proceed with the next one.

Best Regards,
Strahil Nikolov




В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise 
 написа: 






I did.

Here are all three nodes with restart. I find it odd ... their has been a set 
of messages at end (see below) which I don't know enough about what oVirt laid 
out to know if it is bad.

###
[root@thor vmstore]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor 
preset: disabled)
  Drop-In: /etc/systemd/system/glusterd.service.d
           └─99-cpu.conf
   Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
     Docs: man:glusterd(8)
  Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid 
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 2113 (glusterd)
    Tasks: 151 (limit: 1235410)
   Memory: 3.8G
      CPU: 6min 46.050s
   CGroup: /glusterfs.slice/glusterd.service
           ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
           ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p 
/var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log -S 
/var/run/gluster/2f41374c2e36bf4d.socket --xlator-option *replicate*.node-uu>
           ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p 
/var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid
 -S /var/r>
           ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine -p 
/var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p>
           ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore 
-p 
/var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms>
           └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p 
/var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid 
-S /var/run/glu>

Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a 
clustered file-system server...
Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a 
clustered file-system server.
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 
00:32:28.605674] C [MSGID: 106003] 
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: 
Server quorum regained for volume data. Starting lo>
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 
00:32:28.639490] C [MSGID: 106003] 
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: 
Server quorum regained for volume engine. Starting >
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 
00:32:28.680665] C [MSGID: 106003] 
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: 
Server quorum regained for volume vmstore. Starting>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:24.813409] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
0-data-client-0: server 172.16.101.101:24007 has not responded in the last 30 
seconds, discon>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:24.815147] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
2-engine-client-0: server 172.16.101.101:24007 has not responded in the last 30 
seconds, disc>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:24.818735] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
4-vmstore-client-0: server 172.16.101.101:24007 has not responded in the last 
30 seconds, dis>
Sep 21 20:33:36 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:36.816978] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
3-iso-client-0: server 172.16.101.101:24007 has not responded in the last 42 
seconds, disconn>
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]# systemctl restart glusterd
[root@thor vmstore]# systemctl status glusterd
● glusterd.service - GlusterFS, a 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Jeremey Wise
I did.

Here are all three nodes with restart. I find it odd ... their has been a
set of messages at end (see below) which I don't know enough about what
oVirt laid out to know if it is bad.

###
[root@thor vmstore]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled;
vendor preset: disabled)
  Drop-In: /etc/systemd/system/glusterd.service.d
   └─99-cpu.conf
   Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
 Docs: man:glusterd(8)
  Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 2113 (glusterd)
Tasks: 151 (limit: 1235410)
   Memory: 3.8G
  CPU: 6min 46.050s
   CGroup: /glusterfs.slice/glusterd.service
   ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level
INFO
   ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data
-p /var/run/gluster/shd/data/data-shd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/2f41374c2e36bf4d.socket --xlator-option
*replicate*.node-uu>
   ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local
--volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p
/var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid
-S /var/r>
   ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local
--volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine
-p
/var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p>
   ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local
--volfile-id
vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore -p
/var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms>
   └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local
--volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p
/var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid
-S /var/run/glu>

Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a
clustered file-system server...
Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a
clustered file-system server.
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22
00:32:28.605674] C [MSGID: 106003]
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume data. Starting lo>
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22
00:32:28.639490] C [MSGID: 106003]
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume engine. Starting >
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22
00:32:28.680665] C [MSGID: 106003]
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume vmstore. Starting>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22
00:33:24.813409] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired]
0-data-client-0: server 172.16.101.101:24007 has not responded in the last
30 seconds, discon>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22
00:33:24.815147] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired]
2-engine-client-0: server 172.16.101.101:24007 has not responded in the
last 30 seconds, disc>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22
00:33:24.818735] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired]
4-vmstore-client-0: server 172.16.101.101:24007 has not responded in the
last 30 seconds, dis>
Sep 21 20:33:36 thor.penguinpages.local glustershd[2914]: [2020-09-22
00:33:36.816978] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired]
3-iso-client-0: server 172.16.101.101:24007 has not responded in the last
42 seconds, disconn>
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]# systemctl restart glusterd
[root@thor vmstore]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled;
vendor preset: disabled)
  Drop-In: /etc/systemd/system/glusterd.service.d
   └─99-cpu.conf
   Active: active (running) since Tue 2020-09-22 07:24:34 EDT; 2s ago
 Docs: man:glusterd(8)
  Process: 245831 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 245832 (glusterd)
Tasks: 151 (limit: 1235410)
   Memory: 3.8G
  CPU: 132ms
   CGroup: /glusterfs.slice/glusterd.service
   ├─  2914 /usr/sbin/glusterfs -s localhost 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
Have you restarted glusterd.service on the affected node.
glusterd is just management layer and it won't affect the brick processes.

Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 01:43:36 Гринуич+3, Jeremey Wise 
 написа: 






Start is not an option.

It notes two bricks.  but command line denotes three bricks and all present

[root@odin thorst.penguinpages.local:_vmstore]# gluster volume status data
Status of volume: data
Gluster process                             TCP Port  RDMA Port  Online  Pid
--
Brick thorst.penguinpages.local:/gluster_br
icks/data/data                              49152     0          Y       33123
Brick odinst.penguinpages.local:/gluster_br
icks/data/data                              49152     0          Y       2970
Brick medusast.penguinpages.local:/gluster_
bricks/data/data                            49152     0          Y       2646
Self-heal Daemon on localhost               N/A       N/A        Y       3004
Self-heal Daemon on thorst.penguinpages.loc
al                                          N/A       N/A        Y       33230
Self-heal Daemon on medusast.penguinpages.l
ocal                                        N/A       N/A        Y       2475

Task Status of Volume data
--
There are no active volume tasks

[root@odin thorst.penguinpages.local:_vmstore]# gluster peer status
Number of Peers: 2

Hostname: thorst.penguinpages.local
Uuid: 7726b514-e7c3-4705-bbc9-5a90c8a966c9
State: Peer in Cluster (Connected)

Hostname: medusast.penguinpages.local
Uuid: 977b2c1d-36a8-4852-b953-f75850ac5031
State: Peer in Cluster (Connected)
[root@odin thorst.penguinpages.local:_vmstore]#




On Mon, Sep 21, 2020 at 4:32 PM Strahil Nikolov  wrote:
> Just select the volume and press "start" . It will automatically mark "force 
> start" and will fix itself.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В понеделник, 21 септември 2020 г., 20:53:15 Гринуич+3, Jeremey Wise 
>  написа: 
> 
> 
> 
> 
> 
> 
> oVirt engine shows  one of the gluster servers having an issue.  I did a 
> graceful shutdown of all three nodes over weekend as I have to move around 
> some power connections in prep for UPS.
> 
> Came back up.. but
> 
> 
> 
> And this is reflected in 2 bricks online (should be three for each volume)
> 
> 
> Command line shows gluster should be happy.
> 
> [root@thor engine]# gluster peer status
> Number of Peers: 2
> 
> Hostname: odinst.penguinpages.local
> Uuid: 83c772aa-33cd-430f-9614-30a99534d10e
> State: Peer in Cluster (Connected)
> 
> Hostname: medusast.penguinpages.local
> Uuid: 977b2c1d-36a8-4852-b953-f75850ac5031
> State: Peer in Cluster (Connected)
> [root@thor engine]#
> 
> # All bricks showing online
> [root@thor engine]# gluster volume status
> Status of volume: data
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> --
> Brick thorst.penguinpages.local:/gluster_br
> icks/data/data                              49152     0          Y       11001
> Brick odinst.penguinpages.local:/gluster_br
> icks/data/data                              49152     0          Y       2970
> Brick medusast.penguinpages.local:/gluster_
> bricks/data/data                            49152     0          Y       2646
> Self-heal Daemon on localhost               N/A       N/A        Y       50560
> Self-heal Daemon on odinst.penguinpages.loc
> al                                          N/A       N/A        Y       3004
> Self-heal Daemon on medusast.penguinpages.l
> ocal                                        N/A       N/A        Y       2475
> 
> Task Status of Volume data
> --
> There are no active volume tasks
> 
> Status of volume: engine
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> --
> Brick thorst.penguinpages.local:/gluster_br
> icks/engine/engine                          49153     0          Y       11012
> Brick odinst.penguinpages.local:/gluster_br
> icks/engine/engine                          49153     0          Y       2982
> Brick medusast.penguinpages.local:/gluster_
> bricks/engine/engine                        49153     0          Y       2657
> Self-heal Daemon on localhost               N/A       N/A        Y       50560
> Self-heal Daemon on odinst.penguinpages.loc
> al                                          N/A       N/A        Y       3004
> Self-heal Daemon on medusast.penguinpages.l
> ocal                                        N/A       N/A        Y       2475
> 
> Task Status of Volume engine
> --
> There are no active 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-21 Thread Jeremey Wise
Start is not an option.

It notes two bricks.  but command line denotes three bricks and all present

[root@odin thorst.penguinpages.local:_vmstore]# gluster volume status data
Status of volume: data
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick thorst.penguinpages.local:/gluster_br
icks/data/data  49152 0  Y
33123
Brick odinst.penguinpages.local:/gluster_br
icks/data/data  49152 0  Y
2970
Brick medusast.penguinpages.local:/gluster_
bricks/data/data49152 0  Y
2646
Self-heal Daemon on localhost   N/A   N/AY
3004
Self-heal Daemon on thorst.penguinpages.loc
al  N/A   N/AY
33230
Self-heal Daemon on medusast.penguinpages.l
ocalN/A   N/AY
2475

Task Status of Volume data
--
There are no active volume tasks

[root@odin thorst.penguinpages.local:_vmstore]# gluster peer status
Number of Peers: 2

Hostname: thorst.penguinpages.local
Uuid: 7726b514-e7c3-4705-bbc9-5a90c8a966c9
State: Peer in Cluster (Connected)

Hostname: medusast.penguinpages.local
Uuid: 977b2c1d-36a8-4852-b953-f75850ac5031
State: Peer in Cluster (Connected)
[root@odin thorst.penguinpages.local:_vmstore]#




On Mon, Sep 21, 2020 at 4:32 PM Strahil Nikolov 
wrote:

> Just select the volume and press "start" . It will automatically mark
> "force start" and will fix itself.
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
>
>
> В понеделник, 21 септември 2020 г., 20:53:15 Гринуич+3, Jeremey Wise <
> jeremey.w...@gmail.com> написа:
>
>
>
>
>
>
> oVirt engine shows  one of the gluster servers having an issue.  I did a
> graceful shutdown of all three nodes over weekend as I have to move around
> some power connections in prep for UPS.
>
> Came back up.. but
>
>
>
> And this is reflected in 2 bricks online (should be three for each volume)
>
>
> Command line shows gluster should be happy.
>
> [root@thor engine]# gluster peer status
> Number of Peers: 2
>
> Hostname: odinst.penguinpages.local
> Uuid: 83c772aa-33cd-430f-9614-30a99534d10e
> State: Peer in Cluster (Connected)
>
> Hostname: medusast.penguinpages.local
> Uuid: 977b2c1d-36a8-4852-b953-f75850ac5031
> State: Peer in Cluster (Connected)
> [root@thor engine]#
>
> # All bricks showing online
> [root@thor engine]# gluster volume status
> Status of volume: data
> Gluster process TCP Port  RDMA Port  Online
>  Pid
>
> --
> Brick thorst.penguinpages.local:/gluster_br
> icks/data/data  49152 0  Y
> 11001
> Brick odinst.penguinpages.local:/gluster_br
> icks/data/data  49152 0  Y
> 2970
> Brick medusast.penguinpages.local:/gluster_
> bricks/data/data49152 0  Y
> 2646
> Self-heal Daemon on localhost   N/A   N/AY
> 50560
> Self-heal Daemon on odinst.penguinpages.loc
> al  N/A   N/AY
> 3004
> Self-heal Daemon on medusast.penguinpages.l
> ocalN/A   N/AY
> 2475
>
> Task Status of Volume data
>
> --
> There are no active volume tasks
>
> Status of volume: engine
> Gluster process TCP Port  RDMA Port  Online
>  Pid
>
> --
> Brick thorst.penguinpages.local:/gluster_br
> icks/engine/engine  49153 0  Y
> 11012
> Brick odinst.penguinpages.local:/gluster_br
> icks/engine/engine  49153 0  Y
> 2982
> Brick medusast.penguinpages.local:/gluster_
> bricks/engine/engine49153 0  Y
> 2657
> Self-heal Daemon on localhost   N/A   N/AY
> 50560
> Self-heal Daemon on odinst.penguinpages.loc
> al  N/A   N/AY
> 3004
> Self-heal Daemon on medusast.penguinpages.l
> ocalN/A   N/AY
> 2475
>
> Task Status of Volume engine
>
> --
> There are no active volume tasks
>
> Status of volume: iso
> Gluster process TCP Port  RDMA Port  Online
>  Pid
>
> --
> Brick thorst.penguinpages.local:/gluster_br
> icks/iso/iso49156 49157  Y
> 151426
> Brick 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-21 Thread Strahil Nikolov via Users
Just select the volume and press "start" . It will automatically mark "force 
start" and will fix itself.

Best Regards,
Strahil Nikolov






В понеделник, 21 септември 2020 г., 20:53:15 Гринуич+3, Jeremey Wise 
 написа: 






oVirt engine shows  one of the gluster servers having an issue.  I did a 
graceful shutdown of all three nodes over weekend as I have to move around some 
power connections in prep for UPS.

Came back up.. but



And this is reflected in 2 bricks online (should be three for each volume)


Command line shows gluster should be happy.

[root@thor engine]# gluster peer status
Number of Peers: 2

Hostname: odinst.penguinpages.local
Uuid: 83c772aa-33cd-430f-9614-30a99534d10e
State: Peer in Cluster (Connected)

Hostname: medusast.penguinpages.local
Uuid: 977b2c1d-36a8-4852-b953-f75850ac5031
State: Peer in Cluster (Connected)
[root@thor engine]#

# All bricks showing online
[root@thor engine]# gluster volume status
Status of volume: data
Gluster process                             TCP Port  RDMA Port  Online  Pid
--
Brick thorst.penguinpages.local:/gluster_br
icks/data/data                              49152     0          Y       11001
Brick odinst.penguinpages.local:/gluster_br
icks/data/data                              49152     0          Y       2970
Brick medusast.penguinpages.local:/gluster_
bricks/data/data                            49152     0          Y       2646
Self-heal Daemon on localhost               N/A       N/A        Y       50560
Self-heal Daemon on odinst.penguinpages.loc
al                                          N/A       N/A        Y       3004
Self-heal Daemon on medusast.penguinpages.l
ocal                                        N/A       N/A        Y       2475

Task Status of Volume data
--
There are no active volume tasks

Status of volume: engine
Gluster process                             TCP Port  RDMA Port  Online  Pid
--
Brick thorst.penguinpages.local:/gluster_br
icks/engine/engine                          49153     0          Y       11012
Brick odinst.penguinpages.local:/gluster_br
icks/engine/engine                          49153     0          Y       2982
Brick medusast.penguinpages.local:/gluster_
bricks/engine/engine                        49153     0          Y       2657
Self-heal Daemon on localhost               N/A       N/A        Y       50560
Self-heal Daemon on odinst.penguinpages.loc
al                                          N/A       N/A        Y       3004
Self-heal Daemon on medusast.penguinpages.l
ocal                                        N/A       N/A        Y       2475

Task Status of Volume engine
--
There are no active volume tasks

Status of volume: iso
Gluster process                             TCP Port  RDMA Port  Online  Pid
--
Brick thorst.penguinpages.local:/gluster_br
icks/iso/iso                                49156     49157      Y       151426
Brick odinst.penguinpages.local:/gluster_br
icks/iso/iso                                49156     49157      Y       69225
Brick medusast.penguinpages.local:/gluster_
bricks/iso/iso                              49156     49157      Y       45018
Self-heal Daemon on localhost               N/A       N/A        Y       50560
Self-heal Daemon on odinst.penguinpages.loc
al                                          N/A       N/A        Y       3004
Self-heal Daemon on medusast.penguinpages.l
ocal                                        N/A       N/A        Y       2475

Task Status of Volume iso
--
There are no active volume tasks

Status of volume: vmstore
Gluster process                             TCP Port  RDMA Port  Online  Pid
--
Brick thorst.penguinpages.local:/gluster_br
icks/vmstore/vmstore                        49154     0          Y       11023
Brick odinst.penguinpages.local:/gluster_br
icks/vmstore/vmstore                        49154     0          Y       2993
Brick medusast.penguinpages.local:/gluster_
bricks/vmstore/vmstore                      49154     0          Y       2668
Self-heal Daemon on localhost               N/A       N/A        Y       50560
Self-heal Daemon on medusast.penguinpages.l
ocal                                        N/A       N/A        Y       2475
Self-heal Daemon on odinst.penguinpages.loc
al                                          N/A       N/A        Y       3004

Task Status of Volume vmstore
--
There are no active volume 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-21 Thread Jayme
You could try setting host to maintenance and check stop gluster option,
then re-activate host or try restarting glusterd service on the host

On Mon, Sep 21, 2020 at 2:52 PM Jeremey Wise  wrote:

>
> oVirt engine shows  one of the gluster servers having an issue.  I did a
> graceful shutdown of all three nodes over weekend as I have to move around
> some power connections in prep for UPS.
>
> Came back up.. but
>
> [image: image.png]
>
> And this is reflected in 2 bricks online (should be three for each volume)
> [image: image.png]
>
> Command line shows gluster should be happy.
>
> [root@thor engine]# gluster peer status
> Number of Peers: 2
>
> Hostname: odinst.penguinpages.local
> Uuid: 83c772aa-33cd-430f-9614-30a99534d10e
> State: Peer in Cluster (Connected)
>
> Hostname: medusast.penguinpages.local
> Uuid: 977b2c1d-36a8-4852-b953-f75850ac5031
> State: Peer in Cluster (Connected)
> [root@thor engine]#
>
> # All bricks showing online
> [root@thor engine]# gluster volume status
> Status of volume: data
> Gluster process TCP Port  RDMA Port  Online
>  Pid
>
> --
> Brick thorst.penguinpages.local:/gluster_br
> icks/data/data  49152 0  Y
> 11001
> Brick odinst.penguinpages.local:/gluster_br
> icks/data/data  49152 0  Y
> 2970
> Brick medusast.penguinpages.local:/gluster_
> bricks/data/data49152 0  Y
> 2646
> Self-heal Daemon on localhost   N/A   N/AY
> 50560
> Self-heal Daemon on odinst.penguinpages.loc
> al  N/A   N/AY
> 3004
> Self-heal Daemon on medusast.penguinpages.l
> ocalN/A   N/AY
> 2475
>
> Task Status of Volume data
>
> --
> There are no active volume tasks
>
> Status of volume: engine
> Gluster process TCP Port  RDMA Port  Online
>  Pid
>
> --
> Brick thorst.penguinpages.local:/gluster_br
> icks/engine/engine  49153 0  Y
> 11012
> Brick odinst.penguinpages.local:/gluster_br
> icks/engine/engine  49153 0  Y
> 2982
> Brick medusast.penguinpages.local:/gluster_
> bricks/engine/engine49153 0  Y
> 2657
> Self-heal Daemon on localhost   N/A   N/AY
> 50560
> Self-heal Daemon on odinst.penguinpages.loc
> al  N/A   N/AY
> 3004
> Self-heal Daemon on medusast.penguinpages.l
> ocalN/A   N/AY
> 2475
>
> Task Status of Volume engine
>
> --
> There are no active volume tasks
>
> Status of volume: iso
> Gluster process TCP Port  RDMA Port  Online
>  Pid
>
> --
> Brick thorst.penguinpages.local:/gluster_br
> icks/iso/iso49156 49157  Y
> 151426
> Brick odinst.penguinpages.local:/gluster_br
> icks/iso/iso49156 49157  Y
> 69225
> Brick medusast.penguinpages.local:/gluster_
> bricks/iso/iso  49156 49157  Y
> 45018
> Self-heal Daemon on localhost   N/A   N/AY
> 50560
> Self-heal Daemon on odinst.penguinpages.loc
> al  N/A   N/AY
> 3004
> Self-heal Daemon on medusast.penguinpages.l
> ocalN/A   N/AY
> 2475
>
> Task Status of Volume iso
>
> --
> There are no active volume tasks
>
> Status of volume: vmstore
> Gluster process TCP Port  RDMA Port  Online
>  Pid
>
> --
> Brick thorst.penguinpages.local:/gluster_br
> icks/vmstore/vmstore49154 0  Y
> 11023
> Brick odinst.penguinpages.local:/gluster_br
> icks/vmstore/vmstore49154 0  Y
> 2993
> Brick medusast.penguinpages.local:/gluster_
> bricks/vmstore/vmstore  49154 0  Y
> 2668
> Self-heal Daemon on localhost   N/A   N/AY
> 50560
> Self-heal Daemon on medusast.penguinpages.l
> ocalN/A   N/AY
> 2475
> Self-heal Daemon on odinst.penguinpages.loc
> al  N/A   N/AY
> 3004
>
> Task Status of Volume