[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-16 Thread Jarosław Prokopowski
Thanks! That was very interesting conversation :-)
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FY5ZIF5A6BD3YQPU5Y3BT5G7R7PLXU5H/


[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-14 Thread Strahil Nikolov via Users
strict-o-direct just allows the app to define if direct I/O is needed and yes, 
that could be a reason for your data loss.

The good thing is that the feature is part of the virt group and there is a 
"Optimize for Virt" button somewhere in the UI . Yet, I prefer the manual 
approach of building gluster volumes ,as UI's primary focus is oVirt (quite 
natural , right).


Best Regards,
Strahil Nikolov 






В сряда, 14 октомври 2020 г., 12:30:42 Гринуич+3, Jarosław Prokopowski 
 написа: 





Thanks. I will get rid of multipath.

I did not set performance.strict-o-direct specifically, only changed 
permissions of the volume to vdsm.kvm and applied the virt gourp.

Now is see performance.strict-o-direct was off. Could it be the reason of the 
data loss?
Direct I/O is enabled in oVirt by gluster mount option "-o 
direct-io-mode=enable" right?

Below is full list of the volume options.


Option                                  Value                                  
--                                  -                                  
cluster.lookup-unhashed                on                                      
cluster.lookup-optimize                on                                      
cluster.min-free-disk                  10%                                    
cluster.min-free-inodes                5%                                      
cluster.rebalance-stats                off                                    
cluster.subvols-per-directory          (null)                                  
cluster.readdir-optimize                off                                    
cluster.rsync-hash-regex                (null)                                  
cluster.extra-hash-regex                (null)                                  
cluster.dht-xattr-name                  trusted.glusterfs.dht                  
cluster.randomize-hash-range-by-gfid    off                                    
cluster.rebal-throttle                  normal                                  
cluster.lock-migration                  off                                    
cluster.force-migration                off                                    
cluster.local-volume-name              (null)                                  
cluster.weighted-rebalance              on                                      
cluster.switch-pattern                  (null)                                  
cluster.entry-change-log                on                                      
cluster.read-subvolume                  (null)                                  
cluster.read-subvolume-index            -1                                      
cluster.read-hash-mode                  1                                      
cluster.background-self-heal-count      8                                      
cluster.metadata-self-heal              off                                    
cluster.data-self-heal                  off                                    
cluster.entry-self-heal                off                                    
cluster.self-heal-daemon                on                                      
cluster.heal-timeout                    600                                    
cluster.self-heal-window-size          1                                      
cluster.data-change-log                on                                      
cluster.metadata-change-log            on                                      
cluster.data-self-heal-algorithm        full                                    
cluster.eager-lock                      enable                                  
disperse.eager-lock                    on                                      
disperse.other-eager-lock              on                                      
disperse.eager-lock-timeout            1                                      
disperse.other-eager-lock-timeout      1                                      
cluster.quorum-type                    auto                                    
cluster.quorum-count                    (null)                                  
cluster.choose-local                    off                                    
cluster.self-heal-readdir-size          1KB                                    
cluster.post-op-delay-secs              1                                      
cluster.ensure-durability              on                                      
cluster.consistent-metadata            no                                      
cluster.heal-wait-queue-length          128                                    
cluster.favorite-child-policy          none                                    
cluster.full-lock                      yes                                    
diagnostics.latency-measurement        off                                    
diagnostics.dump-fd-stats              off                                    
diagnostics.count-fop-hits              off                                    

[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-14 Thread Jarosław Prokopowski
Thanks. I will get rid of multipath.

I did not set performance.strict-o-direct specifically, only changed 
permissions of the volume to vdsm.kvm and applied the virt gourp.

Now is see performance.strict-o-direct was off. Could it be the reason of the 
data loss?
Direct I/O is enabled in oVirt by gluster mount option "-o 
direct-io-mode=enable" right?

Below is full list of the volume options.


Option  Value   
--  -   
cluster.lookup-unhashed on  
cluster.lookup-optimize on  
cluster.min-free-disk   10% 
cluster.min-free-inodes 5%  
cluster.rebalance-stats off 
cluster.subvols-per-directory   (null)  
cluster.readdir-optimizeoff 
cluster.rsync-hash-regex(null)  
cluster.extra-hash-regex(null)  
cluster.dht-xattr-name  trusted.glusterfs.dht   
cluster.randomize-hash-range-by-gfidoff 
cluster.rebal-throttle  normal  
cluster.lock-migration  off 
cluster.force-migration off 
cluster.local-volume-name   (null)  
cluster.weighted-rebalance  on  
cluster.switch-pattern  (null)  
cluster.entry-change-logon  
cluster.read-subvolume  (null)  
cluster.read-subvolume-index-1  
cluster.read-hash-mode  1   
cluster.background-self-heal-count  8   
cluster.metadata-self-heal  off 
cluster.data-self-heal  off 
cluster.entry-self-heal off 
cluster.self-heal-daemonon  
cluster.heal-timeout600 
cluster.self-heal-window-size   1   
cluster.data-change-log on  
cluster.metadata-change-log on  
cluster.data-self-heal-algorithmfull
cluster.eager-lock  enable  
disperse.eager-lock on  
disperse.other-eager-lock   on  
disperse.eager-lock-timeout 1   
disperse.other-eager-lock-timeout   1   
cluster.quorum-type auto
cluster.quorum-count(null)  
cluster.choose-localoff 
cluster.self-heal-readdir-size  1KB 
cluster.post-op-delay-secs  1   
cluster.ensure-durability   on  
cluster.consistent-metadata no  
cluster.heal-wait-queue-length  128 
cluster.favorite-child-policy   none
cluster.full-lock   yes 
diagnostics.latency-measurement off 
diagnostics.dump-fd-stats   off 
diagnostics.count-fop-hits  off 
diagnostics.brick-log-level INFO
diagnostics.client-log-levelINFO
diagnostics.brick-sys-log-level CRITICAL
diagnostics.client-sys-log-levelCRITICAL
diagnostics.brick-logger(null)  
diagnostics.client-logger   (null) 

[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-13 Thread Strahil Nikolov via Users
One recommendation is to get rid of the multipath for your SSD.
Replica 3 volumes are quite resilient and I'm really surprised it happened to 
you.

For the multipath stuff , you can create something like this:
[root@ovirt1 ~]# cat /etc/multipath/conf.d/blacklist.conf  
blacklist {
   wwid Crucial_CT256MX100SSD1_14390D52DCF5
}

As you are running multipath already , just run the following to get the wwid 
of your ssd :
multipath -v4 | grep 'got wwid of'

What were the gluster vol options you were running with ? oVirt is running the 
volume with 'performance.strict-o-direct' and Direct I/O , so you should not 
loose any data.


Best Regards,
Strahil Nikolov



 





В вторник, 13 октомври 2020 г., 16:35:26 Гринуич+3, Jarosław Prokopowski 
 написа: 





Hi Nikolov,

Thanks for the very interesting answer :-)

I do not use any raid controller. I was hoping glusterfs would take care of 
fault tolerance but apparently it failed.
I have one Samsung 1TB SSD drives in each server for VM storage. I see it is of 
type "multipath".  There is XFS filesystem over standard LVM (not thin). 
Mount options are: inode64,noatime,nodiratime
SELinux was in permissive mode.

I must read more about the things you described as have never  dived into it.
Please let me know if you have any suggestions :-)

Thanks a lot!
Jarek



___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RBIDHY6P3KKTXFMPXP32YQ2FDZNXDB4L/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/476TCRSMAV2T4FKER4LLN2EPSEZRE7SH/


[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-13 Thread Jarosław Prokopowski
Hi Nikolov,

Thanks for the very interesting answer :-)

I do not use any raid controller. I was hoping glusterfs would take care of 
fault tolerance but apparently it failed.
I have one Samsung 1TB SSD drives in each server for VM storage. I see it is of 
type "multipath".  There is XFS filesystem over standard LVM (not thin). 
Mount options are: inode64,noatime,nodiratime
SELinux was in permissive mode.

I must read more about the things you described as have never  dived into it.
Please let me know if you have any suggestions :-)

Thanks a lot!
Jarek

 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RBIDHY6P3KKTXFMPXP32YQ2FDZNXDB4L/


[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-11 Thread Strahil Nikolov via Users
Hi Jaroslaw,

That point was from someone else. I don't think that gluster has a such weak 
point. The only weak point I have seen is the infrastructure it relies ontop 
and of course the built-in limitations it has.

You need to verify the following:
- mount options are important . Using 'nobarrier' but without radi-controller 
protection is devastating. Also I use the following option when using gluster + 
selinux in enforcing mode:
 
context=system_u:object_r:glusterd_brick_t:s0 - it tells the kernel what is the 
selinux context on all files/dirs in the gluster brick and this reduces I/O 
requests to the bricks

My mount options are:
noatime,nodiratime,inode64,nouuid,context="system_u:object_r:glusterd_brick_t:s0"

- Next is your FS - if you use HW raid controller , you need to specify the 
sunit= and swidth= for the 'mkfs.xfs' (and don't forget the '-i size=512')
This tells the XFS about the hardware beneath

- If you use thin LVM , you need to be sure that your '_tmeta' LV of the 
Thinpool LV is not over a VDO device as it doesn't dedupe quite good
I'm using VDO in 'emulate512' as my 'PHY-SEC' is 4096 and oVirt doesn't like it 
:) . You can check yours via 'lsblk -t'.

- Configure and tune your VDO. I think that 1 VDO = 1 Fast disk (NVMe/SSD) as 
I'm not very good in tuning VDO. If you need dedupe - check RedHat's 
documentation about the indexing as the defaults are not optimal.

- Next is the disk scheduler. In case you use NVMe - the linux kernel is taking 
care of it , but for SSDs and large HW arrays - you can enable the multiqueue 
and switch to 'none' via UDEV rules.Of course , testing is needed for every 
prod environment :)
Also consider using noop/none I/O scheduler in the VMs as you don't want to 
reorder I/O requests on VM level , just to do it on Host level.

- You can set your CPU to avoid switching to lower C states -> that adds extra 
latency for the host/VM processes

- Transparent Huge Pages can be a real problem , especially with large VMs. 
oVirt 4.4.x now should support native Huge and Gumbo pages which will reduce 
the stress over the OS.

- vm.swappiness, vm.dirty_background , vm.dirty_*** settings. You can check 
what RH gluster storage is using the ones in the redhat-storage-server rpms: in 
ftp://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHS/SRPMS/

They control the behaviour of the system when to start flushing memory to disk 
and when to block any process until all memory is flushed.


Best Regards,
Strahil Nikolov








В събота, 10 октомври 2020 г., 18:18:55 Гринуич+3, Jarosław Prokopowski 
 написа: 





Thanks Strahil 
The data center is remote so I will definitely ask the lab guys to ensure the 
switch is connected to battery supported power socket. 
So the gluster's weak point is actually the switch in the network? Can it have 
difficulty finding out which version of data is correct after the switch was 
off for some time?

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VFP2FX2YRAPOH3FPS6MBUYD6KXD55VIA/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5G53IXHEF2IPVSGYXATOD6NN5IDAB2YE/


[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-10 Thread Jarosław Prokopowski
Thanks Alex. I actually think that the issue was caused by power loss on the 
switch socket.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ADRGAKW7UFOI252XSB2MJIMVIWQK3B7P/


[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-10 Thread Jarosław Prokopowski
Thanks Strahil 
The data center is remote so I will definitely ask the lab guys to ensure the 
switch is connected to battery supported power socket. 
So the gluster's weak point is actually the switch in the network? Can it have 
difficulty finding out which version of data is correct after the switch was 
off for some time?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VFP2FX2YRAPOH3FPS6MBUYD6KXD55VIA/


[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-09 Thread Alex McWhirter

A few things to consider,

what is your RAID situation per host. If you're using mdadm based soft 
raid, you need to make sure your drives support power loss data 
protection. This is mostly only a feature on enterprise drives. 
Essenstially it ensures the drives reserve enough energy to flush the 
write cache to disk on power loss. Most modern drives have a non-trivial 
amount of built in write cache and losing that data on power loss will 
gladly corrupt files, especially on soft raid setups.


If you're using hardware raid, make sure you have disabled drive based 
write cache, and that you have a battery / capacitor connected for the 
raid cards cache module.


If you're using ZFS, which isn't really supported, you need a good UPS 
and to have it set up to shut systems down cleanly. ZFS will not take 
power outages well. Power loss data protection is really important too, 
but it's not a fixall for ZFS as it also caches writes in systems RAM 
quite a bit. A dedicated cache device with power loss data protection 
can help mitigate that, but really the power issues are a more pressing 
concern in this situation.



As far as gluster is concerned, there is not much that can easily 
corrupt data on power loss. My only thought would be if your switches 
are not also battery backed, this would be an issue.


On 2020-10-08 08:15, Jarosław Prokopowski wrote:

Hi Guys,

I had a situation 2 times that due to unexpected power outage
something went wrong and VMs on glusterfs where not recoverable.
Gluster heal did not help and I could not start the VMs any more.
Is there a way to make such setup bulletproof?
Does it matter which volume type I choose - raw or qcow2? Or thin
provision versus reallocated?
Any other advise?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MRM6H2YENBP3AHQ5JWSFXH6UT6J6SDQS/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZUMGUWDRATJERHSONGMQKHH3T457LVJC/


[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-09 Thread Strahil Nikolov via Users
Based on the logs you shared, it looks like a network issue - but it could 
always be something else.
If you ever experience something like that situation, please share the logs 
immediately and add the gluster mailing list - in order to get assistance with 
the root cause.

Best Regards,
Strahil Nikolov






В петък, 9 октомври 2020 г., 16:26:14 Гринуич+3, Jarosław Prokopowski 
 написа: 





Hmm, I'm not sure. I just created glusterfs volumes on LVM volumes, changed 
ownership to vdsm.kvm and applied virt group. Then I added it to oVirt as 
storage for VMs

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DJEOW53SSPB4REFTJMZBVYQIDDXORLIT/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FIZN6U7XW4XXKQFZNFSQXKS3OAXLONZZ/


[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-09 Thread Jarosław Prokopowski
Hmm, I'm not sure. I just created glusterfs volumes on LVM volumes, changed 
ownership to vdsm.kvm and applied virt group. Then I added it to oVirt as 
storage for VMs
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DJEOW53SSPB4REFTJMZBVYQIDDXORLIT/


[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-09 Thread Jarosław Prokopowski
Hi Strahil,

I remember during after creating the volume I applied the virt group to it.

Volume info:


Volume Name: data
Type: Replicate
Volume ID: 05842cd6-7f16-4329-9ffd-64a0b4366fbe
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: host1storage:/gluster_bricks/data/data
Brick2: host2storage:/gluster_bricks/data/data
Brick3: host3storage:/gluster_bricks/data/data
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
storage.owner-gid: 36
storage.owner-uid: 36
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 1
features.shard: on
user.cifs: off
cluster.choose-local: off
client.event-threads: 4
server.event-threads: 4



I do not have full logs but I have some saved.


/var/log/messages:
-

Sep 14 08:36:20 host1 vdsm[4301]: ERROR Unhandled exception in  timeout=30.0, duration=0.01 at 0x7f1244099210>#012Traceback 
(most recent call last):#012  File 
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in 
_execute_task#012task()#012  File 
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__#012  
  self._callable()#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 315, in 
__call__#012self._execute()#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 357, in 
_execute#012self._vm.updateDriveVolume(drive)#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4189, in 
updateDriveVolume#012vmDrive.volumeID)#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 6101, in 
_getVolumeSize#012(domainID, volumeID))#012StorageUnavailableError: Unable 
to get volume size for doma
 in 88f5972f-58bd-469f-bc77-5bf3b1802291 volume 
cdf313d7-bed3-4fae-a803-1297cdf8c82f
Sep 14 08:37:20 host1 vdsm[4301]: ERROR Unhandled exception in  timeout=30.0, duration=0.00 at 0x7f1244078490>#012Traceback 
(most recent call last):#012  File 
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in 
_execute_task#012task()#012  File 
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__#012  
  self._callable()#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 315, in 
__call__#012self._execute()#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 357, in 
_execute#012self._vm.updateDriveVolume(drive)#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4189, in 
updateDriveVolume#012vmDrive.volumeID)#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 6101, in 
_getVolumeSize#012(domainID, volumeID))#012StorageUnavailableError: Unable 
to get volume size for doma
 in 88f5972f-58bd-469f-bc77-5bf3b1802291 volume 
cdf313d7-bed3-4fae-a803-1297cdf8c82f
Sep 14 08:38:20 host1 vdsm[4301]: ERROR Unhandled exception in  timeout=30.0, duration=0.00 at 0x7f12045aaa90>#012Traceback 
(most recent call last):#012  File 
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in 
_execute_task#012task()#012  File 
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__#012  
  self._callable()#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 315, in 
__call__#012self._execute()#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 357, in 
_execute#012self._vm.updateDriveVolume(drive)#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4189, in 
updateDriveVolume#012vmDrive.volumeID)#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 6101, in 
_getVolumeSize#012(domainID, volumeID))#012StorageUnavailableError: Unable 
to get volume size for doma
 in 88f5972f-58bd-469f-bc77-5bf3b1802291 volume 
cdf313d7-bed3-4fae-a803-1297cdf8c82f
Sep 14 08:39:20 host1 vdsm[4301]: ERROR Unhandled exception in  timeout=30.0, duration=0.01 at 0x7f1287f189d0>#012Traceback 
(most recent call last):#012  File 
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in 
_execute_task#012task()#012  File 
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__#012  
  self._callable()#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 315, in 
__call__#012self._execute()#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 357, in 
_execute#012self._vm.updateDriveVolume(drive)#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4189, in 
updateDriveVolume#012vmDrive.volumeID)#012  File 
"/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 6101, in 

[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-08 Thread Strahil Nikolov via Users
Hi Jaroslaw,

it's more important to find the root cause of the data loss , as this is 
definately not supposed to happen (I got myself several power outages without 
issues).

Do you keep the logs ?

For now , check if your gluster settings (gluster volume info VOL) matches the 
settings in the virt group (/var/lib/glusterd/group/virt - or somethinhg like 
that).


Best Regards,
Strahil Nikolov






В четвъртък, 8 октомври 2020 г., 15:16:10 Гринуич+3, Jarosław Prokopowski 
 написа: 





Hi Guys,

I had a situation 2 times that due to unexpected power outage something went 
wrong and VMs on glusterfs where not recoverable.
Gluster heal did not help and I could not start the VMs any more. 
Is there a way to make such setup bulletproof? 
Does it matter which volume type I choose - raw or qcow2? Or thin provision 
versus reallocated?
Any other advise?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MRM6H2YENBP3AHQ5JWSFXH6UT6J6SDQS/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7NC4BJSEUA4VGO57HJZWDMELHPMSYQG3/


[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-08 Thread WK
are you using JBOD bricks or do you have some sort of RAID for each of 
the bricks?


Are you using sharding?

-wk

On 10/8/2020 6:11 AM, Jarosław Prokopowski wrote:

Hi Jayme, there is UPS but anyway the outages happened. We have also Raritan 
KVM but it is not supported by oVirt.
The setup is 6 hosts - Tow pairs of 3 hosts each using one replica 3 volume.
BTW what would be the best gluster volume solution for 6+ hosts?
  
___

Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AUPSDDU3665CP2NOBVPISX53KYOM7UDN/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IB77WNZDSHK5OLF2ET4Y7ODEJ2NPE323/


[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-08 Thread Jarosław Prokopowski
Hi Jayme, there is UPS but anyway the outages happened. We have also Raritan 
KVM but it is not supported by oVirt.
The setup is 6 hosts - Tow pairs of 3 hosts each using one replica 3 volume.
BTW what would be the best gluster volume solution for 6+ hosts?
 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AUPSDDU3665CP2NOBVPISX53KYOM7UDN/


[ovirt-users] Re: How to make oVirt + GlusterFS bulletproof

2020-10-08 Thread Jayme
IMO this is best handled at hardware level with UPS and battery/flash
backed controllers. Can you share more details about your oVirt setup? How
many servers are you working with andare you using replica 3 or replica 3
arbiter?

On Thu, Oct 8, 2020 at 9:15 AM Jarosław Prokopowski 
wrote:

> Hi Guys,
>
> I had a situation 2 times that due to unexpected power outage something
> went wrong and VMs on glusterfs where not recoverable.
> Gluster heal did not help and I could not start the VMs any more.
> Is there a way to make such setup bulletproof?
> Does it matter which volume type I choose - raw or qcow2? Or thin
> provision versus reallocated?
> Any other advise?
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/MRM6H2YENBP3AHQ5JWSFXH6UT6J6SDQS/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EG54VXKWJMXY5IQWCHJ4BIG7CL2WEXJC/