[ovirt-users] Re: How can Gluster be a HCI default, when it's hardly ever working?

2020-09-01 Thread thomas
In this specific case Ieven used virgin hardware originally.

Once I managed to kill the hosted-engine by downgrading the datacenter cluster 
to legacy, I re-installed all gluster storage from the VDO level up. No traces 
of a file system should be left with LVM and XFS on top, even if I didn't 
actually null the SSD (does writing nulls to an SSD actually cost you an 
overwrite these days or is that translated into a trim by the firmware?)

No difference in terms of faults between the virgin hardware and the 
re-install, so stale Gluster extended file attributes etc. (your error theory, 
I believe) is not a factor.

Choosing between 'vmstore' and 'data' domains for the imports makes no 
difference, full allocation over thin allocation neither. But actually I didn't 
just see write errors from qemu-img, but also read-errors, which had me 
concerned about some other corruption source. That was another motivation to 
start with a fresh source, which meant a backup-domain instead of an export 
domain or OVAs.

The storage underneath the backup domain is NFS (Posix has a 4k issue and I'm 
not sure I want to try moving Glusters between farms just yet), which is easy 
to detach at the source and import at the target. If NFS is your default, oVirt 
can be so much easier, but that more 'professional' domain we use vSphere and 
actually SAN storage. The attraction of oVirt for the lab use case, critically 
depends on HCI and gluster.

The VMs were fine running from the backup domain (which incidentally must have 
lost its backup attribute at the target, because otherwise it should have kept 
the VMs from launching...), but once I tried moving their disks to the gluster, 
I got empty or unusable disks again, or error while moving.

The only way that I found to transfer gluster to gluster was to use disk 
uploads either via the GUI or by Python, but that results into fully allocated 
images and is very slow at 50MB/s even with Python. BTW sparsifying does 
nothing to those images, I guess because sectors full of nulls aren't actually 
the same as a logically unused sector. At least the VDO underneath should take 
reduce some of the overhead.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TTQE7YLN5JKABRGSNOFTV3FMMZNO2DRC/


[ovirt-users] Re: How can Gluster be a HCI default, when it's hardly ever working?

2020-09-01 Thread Strahil Nikolov via Users
Are you reusing a gluster volume or you have created a fresh one ?

Best Regards,
Strahil Nikolov






В вторник, 1 септември 2020 г., 02:58:19 Гринуич+3, tho...@hoberg.net 
 написа: 





I've just tried to verify what you said here.

As a base line I started with the 1nHCI Gluster setup. From four VMs, two 
legacy, two Q35 on the single node Gluster, one survived the import, one failed 
silently with an empty disk, two failed somewhere in the middle of qemu-img 
trying to write the image to the Gluster storage. For each of those two, this 
always happened at the same block number, a unique one per machine, not in 
random places, as if qemu-img reading and writing the very same image could not 
agree. That's two types of error and a 75% failure rate

I created another domain, basically using an NFS automount export from one of 
the HCI nodes (a 4.3 node serving as 4.4 storage) and imported the very same 
VMs (source all 4.3) transported via a re-attached export domain to 4.4. Three 
of the for imports worked fine, no error with qemu-img writing to NFS. All VMs 
had full disk images and launched, which verified that there is nothing wrong 
with the exports at least.

But there was still one, that failed with the same qemu-img error.

I then tried to move the disks from NFS to Gluster, which internally is also 
done via qemu-img, and I had those fail every time.

Gluster or HCI seems a bit of a Russian roulette for migrations, and I am 
wondering how much it is better for normal operations.

I'm still going to try moving via a backup domain (on NFS) and moving between 
that and Gluster, to see if it makes any difference.

I really haven't done a lot of stress testing yet with oVirt, but this 
experience doesn't build confidence.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XM6YYH5H455EPGA33MYDLHYY2J3N35UT/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/73RTGJ3K66HSFARUCGAA2OIR22HCDTCB/