subject:"\[ovirt\-users\] Re\: How can Gluster be a HCI default, when it's hardly ever working\?"

[ovirt-users] Re: How can Gluster be a HCI default, when it's hardly ever working?

2020-09-01 Thread thomas

In this specific case Ieven used virgin hardware originally.

Once I managed to kill the hosted-engine by downgrading the datacenter cluster
to legacy, I re-installed all gluster storage from the VDO level up. No traces
of a file system should be left with LVM and XFS on top, even if I didn't
actually null the SSD (does writing nulls to an SSD actually cost you an
overwrite these days or is that translated into a trim by the firmware?)

No difference in terms of faults between the virgin hardware and the
re-install, so stale Gluster extended file attributes etc. (your error theory,
I believe) is not a factor.

Choosing between 'vmstore' and 'data' domains for the imports makes no
difference, full allocation over thin allocation neither. But actually I didn't
just see write errors from qemu-img, but also read-errors, which had me
concerned about some other corruption source. That was another motivation to
start with a fresh source, which meant a backup-domain instead of an export
domain or OVAs.

The storage underneath the backup domain is NFS (Posix has a 4k issue and I'm
not sure I want to try moving Glusters between farms just yet), which is easy
to detach at the source and import at the target. If NFS is your default, oVirt
can be so much easier, but that more 'professional' domain we use vSphere and
actually SAN storage. The attraction of oVirt for the lab use case, critically
depends on HCI and gluster.

The VMs were fine running from the backup domain (which incidentally must have
lost its backup attribute at the target, because otherwise it should have kept
the VMs from launching...), but once I tried moving their disks to the gluster,
I got empty or unusable disks again, or error while moving.

The only way that I found to transfer gluster to gluster was to use disk
uploads either via the GUI or by Python, but that results into fully allocated
images and is very slow at 50MB/s even with Python. BTW sparsifying does
nothing to those images, I guess because sectors full of nulls aren't actually
the same as a logically unused sector. At least the VDO underneath should take
reduce some of the overhead.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TTQE7YLN5JKABRGSNOFTV3FMMZNO2DRC/

[ovirt-users] Re: How can Gluster be a HCI default, when it's hardly ever working?

2020-09-01 Thread Strahil Nikolov via Users

Are you reusing a gluster volume or you have created a fresh one ?

Best Regards,
Strahil Nikolov






В вторник, 1 септември 2020 г., 02:58:19 Гринуич+3, tho...@hoberg.net 
 написа: 





I've just tried to verify what you said here.

As a base line I started with the 1nHCI Gluster setup. From four VMs, two 
legacy, two Q35 on the single node Gluster, one survived the import, one failed 
silently with an empty disk, two failed somewhere in the middle of qemu-img 
trying to write the image to the Gluster storage. For each of those two, this 
always happened at the same block number, a unique one per machine, not in 
random places, as if qemu-img reading and writing the very same image could not 
agree. That's two types of error and a 75% failure rate

I created another domain, basically using an NFS automount export from one of 
the HCI nodes (a 4.3 node serving as 4.4 storage) and imported the very same 
VMs (source all 4.3) transported via a re-attached export domain to 4.4. Three 
of the for imports worked fine, no error with qemu-img writing to NFS. All VMs 
had full disk images and launched, which verified that there is nothing wrong 
with the exports at least.

But there was still one, that failed with the same qemu-img error.

I then tried to move the disks from NFS to Gluster, which internally is also 
done via qemu-img, and I had those fail every time.

Gluster or HCI seems a bit of a Russian roulette for migrations, and I am 
wondering how much it is better for normal operations.

I'm still going to try moving via a backup domain (on NFS) and moving between 
that and Gluster, to see if it makes any difference.

I really haven't done a lot of stress testing yet with oVirt, but this 
experience doesn't build confidence.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XM6YYH5H455EPGA33MYDLHYY2J3N35UT/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/73RTGJ3K66HSFARUCGAA2OIR22HCDTCB/

[ovirt-users] Re: How can Gluster be a HCI default, when it's hardly ever working?

[ovirt-users] Re: How can Gluster be a HCI default, when it's hardly ever working?

2 matches

Site Navigation

Mail list logo

Footer information