[ovirt-users] Re: How to unlock disk images?

2020-12-15 Thread thomas
Unlocked the snapshot, deleted the VM... done!
That was easiser than I thought.

Thanks for your help in any case!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GFSXLHMJD57L3Q6U6N5KI4ED4GY6OAOG/


[ovirt-users] Re: How to unlock disk images?

2020-12-15 Thread thomas
And I should have guessed, that this is only how trouble starts =:-O

I immediately removed the locked snapshot image (actually they pretty much 
seemed to disappear by themselves as delete operations might have been pending) 
but the VM that was doing the snapshot, is still there, even if the disks are 
long gone.

But there are still references in the VM to two (now gone) snapshot images with 
a status 'illegal' (because they don't exist anymore) that I cannot delete from 
the VM... without which I cannot delete the VM.

Any idea how to fix that?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XFYYH3DOOUPGZRY6Y64VSMCZICPPHMCE/


[ovirt-users] Re: How to unlock disk images?

2020-12-15 Thread thomas
Thanks Shani,

I had just found that myself and even managed to unlock and remove the images.

Somehow the reference to https://ovirt.org/develop/developer-guide/ seems to 
not be available from the ovirt site navigation but now I have discovered it 
via another post here and found a treasure trove.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UWN6AP2T4KFSYHEOV42GTM634WFAGYOR/


[ovirt-users] How to unlock disk images?

2020-12-15 Thread thomas
On one oVirt 4.3 farm I have three locked images I'd like to clear out.

One is an ISO image, that somehow never completed the transfer due to a slow 
network. It's occupying little space, except in the GUI where it sticks out and 
irritates. I guess it would just be an update somewhere on the Postgress 
database to unlock it and have it deletable: But since the schema isn't 
documented, I'd rather ask here: How to I unlock the image?

Two are left-overs from a snapshot that somehow never completed, one for the 
disk another for the RAM part. I don't know how my colleague managed to get 
into that state, but impatience/concurrency probably was a factor, a transient 
failure of a node could have been another.

In any case the snapshot operation logically has been going on for weeks 
without any real activity, survived several restarts (after patches) of all 
nodes and the ME and shows no sign of disappearing voluntarily.

Again, I'd assume that I need to clear out the snapshot job, unlock the images 
and then delete what's left. Some easy SQL and most likely a management engine 
restart afterwards... if you knew what you were doing (or there was an option 
in the GUI).

So how do I list/delete snapshot jobs that aren't really running any more?
And how do I unlock the images so I can delete them?

Thanks for your help!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XHC5GNLOZO2WFECII7AZX3QV2YEZ4NPO/


[ovirt-users] Re: CentOS 8 is dead

2020-12-15 Thread thomas
I am glad you think so and it works for you. But I'd also guess that you put 
more than a partial FTE to the project.

I got attracted via the HCI angle they started pushing as a result of Nutanix 
creating a bit of a stir. The ability to use discarded production boxes for a 
lab with the flexibility of just adding boxes as they were released and 
discarding them as they died eventually, was what I had in mind for oVirt. The 
inherent flexibility of Gluster would seen to support that, KVM's 
live-migration and the fundamental design of the management engine and the VDSM 
agents, fit, too.

Then come the details... and they fall quite short of that vision. I thought I 
had found another golden nugget like OpenVZ, but HCI is still more of a hack 
for three nodes without a natural n+1 path beyond, when Gluster was supposed to 
outscale Lustre.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LB5RIP5RZV5NOBOQ5GSX7KSBVAEEMV7S/


[ovirt-users] Re: CentOS 8 is dead

2020-12-15 Thread thomas
Hi Strahil,

OpenVZ is winding down, unfortunately. They haven't gone near CentOS8 yet any I 
don't see that happen either. It's very unfortunate, because I really loved 
that project and I always preferred its container abstraction as well as the 
resource management tools, because scale-in is really the more prevalent use 
case where I work.

I see Kir Kolyshkin is now at Redhat, but he doesn't seem to be working on cool 
things like CRIU any more, just Kubernetes.

I had issues trying to get CUDA working with OpenVZ (inside containers), too, 
mostly because Nvidia's software was doing stupid things like trying to load 
modules. It's the reason I went back to VMs, because they actually seem to have 
less trouble with GPUs these days, which must have cost man centuries of 
engineer time to achieve.

I'll have to look at RHV pricing to see if it's an alternative. We seem to have 
extremely attractive RHEL licensing prices to push out our CentOS usage and now 
we know how that will change. But I won't be able to use those licenses for my 
home-lab, which is where I test things before I move them to the corporate lab, 
which is hundreds of miles away, instead of under the table.

As far as I am concerned, I did already spend far too much time learning about 
oVirt. I didn't want a full time involvement, but it's clearly what it takes 
and actually a 24x7 team while you're at it. My understanding of a fault 
tolerant environment is really, that you can move maintenance to where it suits 
you and that you just add another brick for more reliability. I've never 
operated Nutanix, but I can't imagine that expanding a 3 node HCI is the same 
experience. E.g. I'd naturally want to use erasure coding with higher node 
counts, but the Python code for that is simply not there: I twiddled with 
Ansible code to get a 4:1 dispersed volume working that I now need to migrate 
to oVirt 4.4...

My commitment to Gluster is hampered by Redhat's commitment to Gluster. 
Initially it seemed just genius, exactly the right approach, especially with 
VDO. But the integration between Gluster and oVirt seems stuck at six months 
after Gluster acquisition, not the years that passed since.

IMHO oVirt is a house of cards, that's a little to agile to run even the lab 
parts of an enterprise.

But for the next year, I'll probably stick with it. But it's chances of 
replacing VMware even via RHEL/RHV for production have shrunk to pretty much 
zero. Too bad that that was exactly what I had in mind.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UZLNWUQUAKYIEDDUU5G3AK4F2BXV6TVS/


[ovirt-users] Re: CentOS 8 is dead

2020-12-14 Thread thomas
The major issue with that is that oVirt 4.3 is out of maintenance, with Python2 
EOL being a main reason.

CentOS reboots are happening, but will be to little avail, when the oVirt team 
won't support them.
It's a mess that does lots of damage to this project, but IBM might just have 
different priorities.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NXSORZZE2ABJIDV5G42EXR7SGSXKBX5W/


[ovirt-users] Re: High performance VM cannot migrate due to TSC frequency

2020-12-14 Thread thomas
I'd put my money on a fall-through error condition where TSC is simply the last 
one with a 'good' error message pointed to. I have clusters with CPUs that are 
both 10 years and 10x apart in performance performing migrations between 
themselves quite happily (Sandy Bridge dual quads to Skylake 56 cores), as long 
as you make sure the cluster and default machine type low enough.

Ok this is 4.3 still, but...

So what if you start the VM on the 'weak' host first? Can it then move freely?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZCV2FQGEWIXKMRLTF6MAKDN2SMST7ZPX/


[ovirt-users] Re: Bad CPU TYPE after Centos 8.3

2020-12-14 Thread thomas
Hi, that CPU looks rather good to me: KVM/oVirt should love it (as would I)!

I am actually more inclined to believe that it may be a side channel mitigation 
issue: KVM is distingiuishing between base CPUs and CPUs with enabled 
mitigations to ensure that you won't accidentally migrate a VM from a 'safe' 
CPU to one that's vulnerable.

So I suggest you have a look at that, seen what's enabled in BIOS and via 
microcode updates and check that against what the cluster wants. Perhaps it may 
involve as little as ensuring all current patches are in (with reboots) after 
the update to 8.3.

E.g. you might see this when the cluster was previously running on a system 
with mitigations active, but now mitigations are off (e.g. because the latest 
microcode updates are not loaded yet).

My personal impression is that it wasn't a terribly good idea to mix 
mitigations and CPU capabilities, but that they didn't have time/means for a 
full redesign to what they might have hoped was a singular/one shot issue, 
before it developed into a whole family of cancers.


Bonne chance!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OHLWLUG23QST3NJCVIX57RR6ZRMSY6VM/


[ovirt-users] Re: CentOS 8 is dead

2020-12-10 Thread thomas
I came to oVirt thinking that it was like CentOS: There might be bugs, but 
given the mainline usage in home and coporate labs with light workloads and 
nothing special, chances to hit one should be pretty minor: I like looking for 
new fronteers atop of my OS, not inside.

I have been runing CentOS/OpenVZ for years in a previous job, mission critical 
24x7 stuff where minutes of outage meant being grilled for hours in meetings 
afterwards. And with PCI-DSS compliance certified. Never had an issue with 
OpenVZ/CentOS, all those minute goofs where human error or Oracle inventing 
execution plans.

Boy was I wrong about oVirt! Just setting it up took weeks. Ansible loves 
eating Gigahertz and I was running on Atoms. I had to learn how to switch from 
an i7 in mid-installation to have it finish at all. I the end I had learned 
tons of new things, but all I wanted was a cluster that would work as much out 
of the box as CentOS or OpenVZ.

Something as fundamental as exporting and importing a VM might simply not work 
and not even get fixed.

Migrating HCI from CentOS7/oVirt 4.3 to CentOS8/oVirt 4.4 is anything but 
smooth, a complete rebuild seems the lesser evil: Now if only exports and 
imports worked reliably!

Rebooting a HCI nodes seems to involve an "I am dying!" aria on the network, 
where the whole switch becomes unresponsive for 10 minutes and the fault 
tolerant cluster on it being 100% unresponsive (including all other machines on 
that switch). I has so much fun resynching gluster file systems and searching 
through all those log files for signs as to what was going on!
And the instructions on how to fix gluster issues seems so wonderfully detailed 
and vague, it seems one could spend days trying to fix things or rebuild and 
restore. It doesn't help that the fate of Gluster very much seems to hang in 
the air, when the scalable HCI aspect was the only reason I ever wanted oVirt.

Could just be an issue with RealTek adapters, because I never oberved something 
like that with Intel NICs or on (recycled old) enterprise hardware

I guess official support for a 3 node HCI cluster on passive Atoms isn't going 
to happen, unless I make happen 100% myself: It's open source after all!

Just think what 3/6/9 node HCI based on Raspberry PI would do for the project! 
The 9 node HCI should deliver better 10Gbit GlusterFS performance than most 
QNAP units at the same cost with a single 10Gbit interface even with 7:2 
erasure coding!

I really think the future of oVirt may be at the edge, not in the datacenter 
core.

In short: oVirt is very much beta software and quite simply a full-time job if 
you depend on it working over time.

I can't see that getting any better when one beta gets to run on top of another 
beta. At the moment my oVirt experience has me doubt RHV on RHEL would work any 
better, even if it's cheaper than VMware.

OpenVZ was simply the far better alternative than KVM for most of the things I 
needed from virtualization and it was mainly the hastle of trying to make that 
work with RHEL which had me switching to CentOS. CentOS with OpenVZ was the 
bedrock of that business for 15 years and proved to me that Redhat was 
hell-bent on making bad decisions on technological direction.

I would have actually liked to pay a license for each of the physical hosts we 
used, but it turned out much less of a bother to forget about negotiating 
licensing conditions for OpenVZ containers and use CentOS instead.

BTW: I am going into a meeting tomorrow, where after two years of pilot usage, 
we might just decide to kill our current oVirt farms, because they didn't 
deliver on "a free open-source virtualization solution for your entire 
enterprise".

I'll keep my Atoms running a little longer, mostly because I have nothing else 
to use them for. For a first time in months, they show zero gluster replication 
errors, perhaps because for lack of updates there have been no node reboots. 
CentOS 7 is stable, but oVirt 4.3 out of support.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBZ46VXFZZXOMBNLNQTB34ZFYFVGDPB2/


[ovirt-users] Re: Latest ManagedBlockDevice documentation

2020-10-15 Thread Michael Thomas

On 10/15/20 11:27 AM, Jeff Bailey wrote:


On 10/15/2020 12:07 PM, Michael Thomas wrote:

On 10/15/20 10:19 AM, Jeff Bailey wrote:


On 10/15/2020 10:01 AM, Michael Thomas wrote:

Getting closer...

I recreated the storage domain and added rbd_default_features=3 to 
ceph.conf.  Now I see the new disk being created with (what I think 
is) the correct set of features:


# rbd info rbd.ovirt.data/volume-f4ac68c6-e5f7-4b01-aed0-36a55b901
fbf
rbd image 'volume-f4ac68c6-e5f7-4b01-aed0-36a55b901fbf':
    size 100 GiB in 25600 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: 70aab541cb331
    block_name_prefix: rbd_data.70aab541cb331
    format: 2
    features: layering
    op_features:
    flags:
    create_timestamp: Thu Oct 15 06:53:23 2020
    access_timestamp: Thu Oct 15 06:53:23 2020
    modify_timestamp: Thu Oct 15 06:53:23 2020

However, I'm still unable to attach the disk to a VM.  This time 
it's a permissions issue on the ovirt node where the VM is running. 
It looks like it can't read the temporary ceph config file that is 
sent over from the engine:



Are you using octopus?  If so, the config file that's generated is 
missing the "[global]" at the top and octopus doesn't like that. It's 
been patched upstream.


Yes, I am using Octopus (15.2.4).  Do you have a pointer to the 
upstream patch or issue so that I can watch for a release with the fix?



https://bugs.launchpad.net/cinder/+bug/1865754


And for anyone playing along at home, I was able to map this back to the 
openstack ticket:


https://review.opendev.org/#/c/730376/

It's a simple fix.  I just changed line 100 of 
/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/rbd.py to:


conf_file.writelines(["[global]", "\n", mon_hosts, "\n", keyring, "\n"])


After applying this patch, I was finally able to attach my ceph block 
device to a running VM.  I've now got virtually unlimited data storage 
for my VMs.  Many thanks to you and Benny for the help!


--Mike
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UET4Q7BDRBWPWSQ4FNZY5XW6S4LJV4KK/


[ovirt-users] Re: Latest ManagedBlockDevice documentation

2020-10-15 Thread Michael Thomas

On 10/15/20 10:19 AM, Jeff Bailey wrote:


On 10/15/2020 10:01 AM, Michael Thomas wrote:

Getting closer...

I recreated the storage domain and added rbd_default_features=3 to 
ceph.conf.  Now I see the new disk being created with (what I think 
is) the correct set of features:


# rbd info rbd.ovirt.data/volume-f4ac68c6-e5f7-4b01-aed0-36a55b901
fbf
rbd image 'volume-f4ac68c6-e5f7-4b01-aed0-36a55b901fbf':
    size 100 GiB in 25600 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: 70aab541cb331
    block_name_prefix: rbd_data.70aab541cb331
    format: 2
    features: layering
    op_features:
    flags:
    create_timestamp: Thu Oct 15 06:53:23 2020
    access_timestamp: Thu Oct 15 06:53:23 2020
    modify_timestamp: Thu Oct 15 06:53:23 2020

However, I'm still unable to attach the disk to a VM.  This time it's 
a permissions issue on the ovirt node where the VM is running.  It 
looks like it can't read the temporary ceph config file that is sent 
over from the engine:



Are you using octopus?  If so, the config file that's generated is 
missing the "[global]" at the top and octopus doesn't like that.  It's 
been patched upstream.


Yes, I am using Octopus (15.2.4).  Do you have a pointer to the upstream 
patch or issue so that I can watch for a release with the fix?


--Mike
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LBG4EEWFWDLSBTBLYD6NTBQWTBJRPQDK/


[ovirt-users] Re: Latest ManagedBlockDevice documentation

2020-10-15 Thread Michael Thomas

Getting closer...

I recreated the storage domain and added rbd_default_features=3 to 
ceph.conf.  Now I see the new disk being created with (what I think is) 
the correct set of features:


# rbd info rbd.ovirt.data/volume-f4ac68c6-e5f7-4b01-aed0-36a55b901
fbf
rbd image 'volume-f4ac68c6-e5f7-4b01-aed0-36a55b901fbf':
size 100 GiB in 25600 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 70aab541cb331
block_name_prefix: rbd_data.70aab541cb331
format: 2
features: layering
op_features:
flags:
create_timestamp: Thu Oct 15 06:53:23 2020
access_timestamp: Thu Oct 15 06:53:23 2020
modify_timestamp: Thu Oct 15 06:53:23 2020

However, I'm still unable to attach the disk to a VM.  This time it's a 
permissions issue on the ovirt node where the VM is running.  It looks 
like it can't read the temporary ceph config file that is sent over from 
the engine:


https://pastebin.com/pGjMTvcn

The file '/tmp/brickrbd_nwc3kywk' on the ovirt node is only accessible 
by root:


[root@ovirt4 ~]# ls -l /tmp/brickrbd_nwc3kywk
-rw---. 1 root root 146 Oct 15 07:25 /tmp/brickrbd_nwc3kywk

...and I'm guessing that it's being accessed by the vdsm user?

--Mike

On 10/14/20 10:59 AM, Michael Thomas wrote:

Hi Benny,

You are correct, I tried attaching to a running VM (which failed), then
tried booting a new VM using this disk (which also failed).  I'll use
the workaround in the bug report going forward.

I'll just recreate the storage domain, since at this point I have
nothing in it to lose.

Regards,

--Mike

On 10/14/20 9:32 AM, Benny Zlotnik wrote:

Did you attempt to start a VM with this disk and it failed, or you
didn't try at all? If it's the latter then the error is strange...
If it's the former there is a known issue with multipath at the
moment, see[1] for a workaround, since you might have issues with
detaching volumes which later, because multipath grabs the rbd devices
which would fail `rbd unmap`, it will be fixed soon by automatically
blacklisting rbd in multipath configuration.

Regarding editing, you can submit an RFE for this, but it is currently
not possible. The options are indeed to either recreate the storage
domain or edit the database table


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1881832#c8




On Wed, Oct 14, 2020 at 3:40 PM Michael Thomas  wrote:


On 10/14/20 3:30 AM, Benny Zlotnik wrote:

Jeff is right, it's a limitation of kernel rbd, the recommendation is
to add `rbd default features = 3` to the configuration. I think there
are plans to support rbd-nbd in cinderlib which would allow using
additional features, but I'm not aware of anything concrete.

Additionally, the path for the cinderlib log is
/var/log/ovirt-engine/cinderlib/cinderlib.log, the error in this case
would appear in the vdsm.log on the relevant host, and would look
something like "RBD image feature set mismatch. You can disable
features unsupported by the kernel with 'rbd feature disable'"


Thanks for the pointer!  Indeed,
/var/log/ovirt-engine/cinderlib/cinderlib.log has the errors that I was
looking for.  In this case, it was a user error entering the RBDDriver
options:


2020-10-13 15:15:25,640 - cinderlib.cinderlib - WARNING - Unknown config
option use_multipath_for_xfer

...it should have been 'use_multipath_for_image_xfer'.

Now my attempts to fix it are failing...  If I go to 'Storage -> Storage
Domains -> Manage Domain', all driver options are unedittable except for
'Name'.

Then I thought that maybe I can't edit the driver options while a disk
still exists, so I tried removing the one disk in this domain.  But even
after multiple attempts, it still fails with:

2020-10-14 07:26:31,340 - cinder.volume.drivers.rbd - INFO - volume
volume-5419640e-445f-4b3f-a29d-b316ad031b7a no longer exists in backend
2020-10-14 07:26:31,353 - cinderlib-client - ERROR - Failure occurred
when trying to run command 'delete_volume': (psycopg2.IntegrityError)
update or delete on table "volumes" violates foreign key constraint
"volume_attachment_volume_id_fkey" on table "volume_attachment"
DETAIL:  Key (id)=(5419640e-445f-4b3f-a29d-b316ad031b7a) is still
referenced from table "volume_attachment".

See https://pastebin.com/KwN1Vzsp for the full log entries related to
this removal.

It's not lying, the volume no longer exists in the rbd pool, but the
cinder database still thinks it's attached, even though I was never able
to get it to attach to a VM.

What are my options for cleaning up this stale disk in the cinder database?

How can I update the driver options in my storage domain (deleting and
recreating the domain is acceptable, if possible)?

--Mike




___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://

[ovirt-users] Re: Latest ManagedBlockDevice documentation

2020-10-14 Thread Michael Thomas
Hi Benny,

You are correct, I tried attaching to a running VM (which failed), then
tried booting a new VM using this disk (which also failed).  I'll use
the workaround in the bug report going forward.

I'll just recreate the storage domain, since at this point I have
nothing in it to lose.

Regards,

--Mike

On 10/14/20 9:32 AM, Benny Zlotnik wrote:
> Did you attempt to start a VM with this disk and it failed, or you
> didn't try at all? If it's the latter then the error is strange...
> If it's the former there is a known issue with multipath at the
> moment, see[1] for a workaround, since you might have issues with
> detaching volumes which later, because multipath grabs the rbd devices
> which would fail `rbd unmap`, it will be fixed soon by automatically
> blacklisting rbd in multipath configuration.
> 
> Regarding editing, you can submit an RFE for this, but it is currently
> not possible. The options are indeed to either recreate the storage
> domain or edit the database table
> 
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1881832#c8
> 
> 
> 
> 
> On Wed, Oct 14, 2020 at 3:40 PM Michael Thomas  wrote:
>>
>> On 10/14/20 3:30 AM, Benny Zlotnik wrote:
>>> Jeff is right, it's a limitation of kernel rbd, the recommendation is
>>> to add `rbd default features = 3` to the configuration. I think there
>>> are plans to support rbd-nbd in cinderlib which would allow using
>>> additional features, but I'm not aware of anything concrete.
>>>
>>> Additionally, the path for the cinderlib log is
>>> /var/log/ovirt-engine/cinderlib/cinderlib.log, the error in this case
>>> would appear in the vdsm.log on the relevant host, and would look
>>> something like "RBD image feature set mismatch. You can disable
>>> features unsupported by the kernel with 'rbd feature disable'"
>>
>> Thanks for the pointer!  Indeed,
>> /var/log/ovirt-engine/cinderlib/cinderlib.log has the errors that I was
>> looking for.  In this case, it was a user error entering the RBDDriver
>> options:
>>
>>
>> 2020-10-13 15:15:25,640 - cinderlib.cinderlib - WARNING - Unknown config
>> option use_multipath_for_xfer
>>
>> ...it should have been 'use_multipath_for_image_xfer'.
>>
>> Now my attempts to fix it are failing...  If I go to 'Storage -> Storage
>> Domains -> Manage Domain', all driver options are unedittable except for
>> 'Name'.
>>
>> Then I thought that maybe I can't edit the driver options while a disk
>> still exists, so I tried removing the one disk in this domain.  But even
>> after multiple attempts, it still fails with:
>>
>> 2020-10-14 07:26:31,340 - cinder.volume.drivers.rbd - INFO - volume
>> volume-5419640e-445f-4b3f-a29d-b316ad031b7a no longer exists in backend
>> 2020-10-14 07:26:31,353 - cinderlib-client - ERROR - Failure occurred
>> when trying to run command 'delete_volume': (psycopg2.IntegrityError)
>> update or delete on table "volumes" violates foreign key constraint
>> "volume_attachment_volume_id_fkey" on table "volume_attachment"
>> DETAIL:  Key (id)=(5419640e-445f-4b3f-a29d-b316ad031b7a) is still
>> referenced from table "volume_attachment".
>>
>> See https://pastebin.com/KwN1Vzsp for the full log entries related to
>> this removal.
>>
>> It's not lying, the volume no longer exists in the rbd pool, but the
>> cinder database still thinks it's attached, even though I was never able
>> to get it to attach to a VM.
>>
>> What are my options for cleaning up this stale disk in the cinder database?
>>
>> How can I update the driver options in my storage domain (deleting and
>> recreating the domain is acceptable, if possible)?
>>
>> --Mike
>>
> 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3WIVWLKS347QKA2GMIGF4ZEMLFBJQ7SU/


[ovirt-users] Re: Latest ManagedBlockDevice documentation

2020-10-14 Thread Michael Thomas

On 10/14/20 3:30 AM, Benny Zlotnik wrote:

Jeff is right, it's a limitation of kernel rbd, the recommendation is
to add `rbd default features = 3` to the configuration. I think there
are plans to support rbd-nbd in cinderlib which would allow using
additional features, but I'm not aware of anything concrete.

Additionally, the path for the cinderlib log is
/var/log/ovirt-engine/cinderlib/cinderlib.log, the error in this case
would appear in the vdsm.log on the relevant host, and would look
something like "RBD image feature set mismatch. You can disable
features unsupported by the kernel with 'rbd feature disable'"


Thanks for the pointer!  Indeed, 
/var/log/ovirt-engine/cinderlib/cinderlib.log has the errors that I was 
looking for.  In this case, it was a user error entering the RBDDriver 
options:



2020-10-13 15:15:25,640 - cinderlib.cinderlib - WARNING - Unknown config 
option use_multipath_for_xfer


...it should have been 'use_multipath_for_image_xfer'.

Now my attempts to fix it are failing...  If I go to 'Storage -> Storage 
Domains -> Manage Domain', all driver options are unedittable except for 
'Name'.


Then I thought that maybe I can't edit the driver options while a disk 
still exists, so I tried removing the one disk in this domain.  But even 
after multiple attempts, it still fails with:


2020-10-14 07:26:31,340 - cinder.volume.drivers.rbd - INFO - volume 
volume-5419640e-445f-4b3f-a29d-b316ad031b7a no longer exists in backend
2020-10-14 07:26:31,353 - cinderlib-client - ERROR - Failure occurred 
when trying to run command 'delete_volume': (psycopg2.IntegrityError) 
update or delete on table "volumes" violates foreign key constraint 
"volume_attachment_volume_id_fkey" on table "volume_attachment"
DETAIL:  Key (id)=(5419640e-445f-4b3f-a29d-b316ad031b7a) is still 
referenced from table "volume_attachment".


See https://pastebin.com/KwN1Vzsp for the full log entries related to 
this removal.


It's not lying, the volume no longer exists in the rbd pool, but the 
cinder database still thinks it's attached, even though I was never able 
to get it to attach to a VM.


What are my options for cleaning up this stale disk in the cinder database?

How can I update the driver options in my storage domain (deleting and 
recreating the domain is acceptable, if possible)?


--Mike
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XTULDZ4DON6E4KMXQ5NVZQIZTRK4CZPQ/


[ovirt-users] Re: Latest ManagedBlockDevice documentation

2020-10-13 Thread Michael Thomas
To verify that it's not a cephx permission issue, I tried accessing the 
block storage from both the engine and the ovirt node using the 
credentials I set up in the ManagedBlockStorage setup page:


[root@ovirt4]# rbd --id ovirt ls rbd.ovirt.data
volume-5419640e-445f-4b3f-a29d-b316ad031b7a
[root@ovirt4]# rbd --id ovirt info 
rbd.ovirt.data/volume-5419640e-445f-4b3f-a29d-b316ad031b7a

rbd image 'volume-5419640e-445f-4b3f-a29d-b316ad031b7a':
size 100 GiB in 25600 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 68a7cd6aeb3924
block_name_prefix: rbd_data.68a7cd6aeb3924
format: 2
features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten

op_features:
flags:
create_timestamp: Tue Oct 13 06:53:55 2020
access_timestamp: Tue Oct 13 06:53:55 2020
modify_timestamp: Tue Oct 13 06:53:55 2020

Where else can I look to see where it's failing?

--Mike

On 9/30/20 2:19 AM, Benny Zlotnik wrote:

When you ran `engine-setup` did you enable cinderlib preview (it will
not be enabled by default)?
It should handle the creation of the database automatically, if you
didn't you can enable it by running:
`engine-setup --reconfigure-optional-components`


On Wed, Sep 30, 2020 at 1:58 AM Michael Thomas  wrote:


Hi Benny,

Thanks for the confirmation.  I've installed openstack-ussuri and ceph
Octopus.  Then I tried using these instructions, as well as the deep
dive that Eyal has posted at https://www.youtube.com/watch?v=F3JttBkjsX8.

I've done this a couple of times, and each time the engine fails when I
try to add the new managed block storage domain.  The error on the
screen indicates that it can't connect to the cinder database.  The
error in the engine log is:

2020-09-29 17:02:11,859-05 WARN
[org.ovirt.engine.core.bll.storage.domain.AddManagedBlockStorageDomainCommand]
(default task-2) [d519088c-7956-4078-b5cf-156e5b3f1e59] Validation of
action 'AddManagedBlockStorageDomain' failed for user
admin@internal-authz. Reasons:
VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__ADD,ACTION_TYPE_FAILED_CINDERLIB_DATA_BASE_REQUIRED,ACTION_TYPE_FAILED_CINDERLIB_DATA_BASE_REQUIRED

I had created the db on the engine with this command:

su - postgres -c "psql -d template1 -c \"create database cinder owner
engine template template0 encoding 'UTF8' lc_collate 'en_US.UTF-8'
lc_ctype 'en_US.UTF-8';\""

...and added the following to the end of /var/lib/pgsql/data/pg_hba.conf:

  hostcinder  engine  ::0/0   md5
  hostcinder  engine  0.0.0.0/0   md5

Is there anywhere else I should look to find out what may have gone wrong?

--Mike

On 9/29/20 3:34 PM, Benny Zlotnik wrote:

The feature is currently in tech preview, but it's being worked on.
The feature page is outdated,  but I believe this is what most users
in the mailing list were using. We held off on updating it because the
installation instructions have been a moving target, but it is more
stable now and I will update it soon.

Specifically speaking, the openstack version should be updated to
train (it is likely ussuri works fine too, but I haven't tried it) and
cinderlib has an RPM now (python3-cinderlib)[1], so it can be
installed instead of using pip, same goes for os-brick. The rest of
the information is valid.


[1] http://mirror.centos.org/centos/8/cloud/x86_64/openstack-ussuri/Packages/p/

On Tue, Sep 29, 2020 at 10:37 PM Michael Thomas  wrote:


I'm looking for the latest documentation for setting up a Managed Block
Device storage domain so that I can move some of my VM images to ceph rbd.

I found this:

https://ovirt.org/develop/release-management/features/storage/cinderlib-integration.html

...but it has a big note at the top that it is "...not user
documentation and should not be treated as such."

The oVirt administration guide[1] does not talk about managed block devices.

I've found a few mailing list threads that discuss people setting up a
Managed Block Device with ceph, but didn't see any links to
documentation steps that folks were following.

Is the Managed Block Storage domain a supported feature in oVirt 4.4.2,
and if so, where is the documentation for using it?

--Mike
[1]ovirt.org/documentation/administration_guide/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KHCLXVOCELHOR3G7SH3GDPGRKITCW7UY/







___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/a

[ovirt-users] Re: Latest ManagedBlockDevice documentation

2020-10-13 Thread Michael Thomas

Setting both http_proxy and https_proxy fixed the issue.

Thanks for the tip!

--Mike


I am not sure, it's been a long time since I tried that.

Feel free to file a bug.

You can also try setting env var 'http_proxy' for engine-setup, e.g.:

http_proxy=MY_PROXY_URL engine-setup --reconfigure-optional-components

Alternatively, you can also add '--offline' to engine-setup cmd, and then it
won't do any package management (not try to update, check for updates, etc.).

Best regards,

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1392312



--Mike


On 9/30/20 2:19 AM, Benny Zlotnik wrote:

When you ran `engine-setup` did you enable cinderlib preview (it will
not be enabled by default)?
It should handle the creation of the database automatically, if you
didn't you can enable it by running:
`engine-setup --reconfigure-optional-components`


On Wed, Sep 30, 2020 at 1:58 AM Michael Thomas  wrote:


Hi Benny,

Thanks for the confirmation.  I've installed openstack-ussuri and ceph
Octopus.  Then I tried using these instructions, as well as the deep
dive that Eyal has posted at https://www.youtube.com/watch?v=F3JttBkjsX8.

I've done this a couple of times, and each time the engine fails when I
try to add the new managed block storage domain.  The error on the
screen indicates that it can't connect to the cinder database.  The
error in the engine log is:

2020-09-29 17:02:11,859-05 WARN
[org.ovirt.engine.core.bll.storage.domain.AddManagedBlockStorageDomainCommand]
(default task-2) [d519088c-7956-4078-b5cf-156e5b3f1e59] Validation of
action 'AddManagedBlockStorageDomain' failed for user
admin@internal-authz. Reasons:
VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__ADD,ACTION_TYPE_FAILED_CINDERLIB_DATA_BASE_REQUIRED,ACTION_TYPE_FAILED_CINDERLIB_DATA_BASE_REQUIRED

I had created the db on the engine with this command:

su - postgres -c "psql -d template1 -c \"create database cinder owner
engine template template0 encoding 'UTF8' lc_collate 'en_US.UTF-8'
lc_ctype 'en_US.UTF-8';\""

...and added the following to the end of /var/lib/pgsql/data/pg_hba.conf:

   hostcinder  engine  ::0/0   md5
   hostcinder  engine  0.0.0.0/0   md5

Is there anywhere else I should look to find out what may have gone wrong?

--Mike

On 9/29/20 3:34 PM, Benny Zlotnik wrote:

The feature is currently in tech preview, but it's being worked on.
The feature page is outdated,  but I believe this is what most users
in the mailing list were using. We held off on updating it because the
installation instructions have been a moving target, but it is more
stable now and I will update it soon.

Specifically speaking, the openstack version should be updated to
train (it is likely ussuri works fine too, but I haven't tried it) and
cinderlib has an RPM now (python3-cinderlib)[1], so it can be
installed instead of using pip, same goes for os-brick. The rest of
the information is valid.


[1] http://mirror.centos.org/centos/8/cloud/x86_64/openstack-ussuri/Packages/p/

On Tue, Sep 29, 2020 at 10:37 PM Michael Thomas  wrote:


I'm looking for the latest documentation for setting up a Managed Block
Device storage domain so that I can move some of my VM images to ceph rbd.

I found this:

https://ovirt.org/develop/release-management/features/storage/cinderlib-integration.html

...but it has a big note at the top that it is "...not user
documentation and should not be treated as such."

The oVirt administration guide[1] does not talk about managed block devices.

I've found a few mailing list threads that discuss people setting up a
Managed Block Device with ceph, but didn't see any links to
documentation steps that folks were following.

Is the Managed Block Storage domain a supported feature in oVirt 4.4.2,
and if so, where is the documentation for using it?

--Mike
[1]ovirt.org/documentation/administration_guide/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KHCLXVOCELHOR3G7SH3GDPGRKITCW7UY/














___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2KT3QQZZESXOTSQFBZZXYDH5WNZVKMJZ/


[ovirt-users] Re: CEPH - Opinions and ROI

2020-10-04 Thread thomas
Thanks a lot for this feed back!

I've never had any practical experience with Ceph, MooseFS, BeeGFS or Lustre: 
GlusterFS to me mostly had the charme of running on 1/2/3 nodes and then 
anything beyond at a balanced benefits in terms of resilience vs. 
performance... in theory, of course.

And on top, the fact that (without sharding), if hell broke loose, you'd always 
have access to the files on the file system below, was a great help in building 
up enough confidence to go and try it.

Politics and real-world came much later and from my experience with TSO, 370s, 
Lotus Notes and QuickTransit, I can appreciate the destructive power of IBM: 
Let's just hope they don't give into the temptation of "streamlining their 
offerings" the wrong direction.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GMCOJLTBOTWBXW6HMREJVDOPL265URTG/


[ovirt-users] Re: CEPH - Opinions and ROI

2020-10-04 Thread thomas
Thank you very much for your story!
It has very much confirmed a few suspicions that have been gathering over the 
last... O my God! Has it two years already?

1. Don't expect plug-and-play, unless you're on SAN or NFS (even HCI doesn't 
seem to be in the heart of the oVirt team)
2. Don't expect RHV to be anything but more expensive than oVirt

I am running oVirt at home on silent/passive Atoms, because that represents an 
edge use case for me, where the oVirt HCI variant potentially has its best 
value proposition: Unfortunately it's insignificant in terms of revenue...

I am also running it in a corporate R lab on old recycled servers, where 
there is no other storage, either and I simply don't want to invest in a new 
Ceph skill, when Gluster based HCI should do it out of the box.

IMHO RedHat can't afford to continue treating Gluster the way they seem to: 
Without HCI, oVirt is dead for me and Gluster on its own is the superior 
concept to quite a few alternatives. If anything, I'd want Gluster on hardware 
like Fungible DPUs for mind-bogling HPC throughput.

As far as I understand they have just cancelled a major Gluster refactoring, 
but if that is what it takes, they may just have to start a little smaller, but 
do it anyway.

And of course I want Gluster to switch between single node, replication and 
dispersion seemlessly and on the fly, as well as much better diagnostic tools.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2WPRWSLJNEU6NH74UG73G5LQFOPZXW2H/


[ovirt-users] Re: ovirt-engine and host certification is expired in ovirt4.0

2020-10-04 Thread thomas
From what I observed (but it's not something I try often), if you try to enable 
maintenance on a host and have VMs on it, it will try migrating the VMs first, 
which is a copy-first, state-transfer-afterwards process. So if there is no 
migration target available or if the copying and state-transfer fail, the VM 
will simply continue to run on the original host... and the host will refuse to 
go into maintenance.

It doesn't solve your problem, but the loss of service you fear shouldn't 
happen either... except sometimes oVirt seems to have bugs or the resulting 
network activity cause confusion.

Ah, perhaps this is important: I've only ever tried that by setting a host into 
maintenance (typically for patch updates) via the GUI. I am far less convinced 
that VM migration would also be triggered if you use the 'hosted-engine 
--set-maintenance --mode=local' variant on the host that runs the HostedEngine 
VM. That might just make it unavailable for newly started VMs.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7D6XC4YHIKMWSCJWZC2TJFMMD27PT4LD/


[ovirt-users] Re: Upgrade oVirt Host from 4.4.0 to 4.4.2 fails

2020-10-02 Thread Michael Thomas
This is a shot in the dark, but it's possible that your dnf command was 
running off of cached repo metadata.


Try running 'dnf clean metadata' before 'dnf upgrade'.

--Mike

On 10/2/20 12:38 PM, Erez Zarum wrote:

Hey,
Bunch of hosts installed from oVirt Node Image, i have upgraded the self-hosted 
Engine successfully.
I have ran Check Upgrade on one of the hosts and it was entitled for an upgrade.
I use the UI to let it Upgrade, after multiple retries it always fails on "Prepare 
NGN host for upgrade."
so i chose another host as a test.
I have set the Host into Maintenance and let all the VMs migrate successfully.
made sure i do have the latest 4.4.2 repo (it was 4.4.0) yum install 
https://resources.ovirt.org/pub/yum-repo/ovirt-release44.rpm
And issued "dnf upgrade"
Installing:
  ovirt-openvswitch
  replacing  openvswitch.x86_64 2.11.1-5.el8
  ovirt-openvswitch-ovn
  replacing  ovn.x86_64 2.11.1-5.el8
  ovirt-openvswitch-ovn-host
  replacing  ovn-host.x86_64 2.11.1-5.el8
  ovirt-python-openvswitch
  replacing  python3-openvswitch.x86_64 2.11.1-5.el8
Upgrading:
  ovirt-node-ng-image-update-placeholder
Installing dependencies:
  openvswitch2.11
  ovirt-openvswitch-ovn-common
  ovn2.11
  ovn2.11-host
  python3-openvswitch2.11
Installing weak dependencies:
  network-scripts-openvswitch
  network-scripts-openvswitch2.11

It was very quick, but nothing else happened, I did try to reboot the host but 
I still see the host as oVirt 4.4.0 and as expected it still says that an 
update is available.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WL67T6DNFNS3QOZ2ZHK75JXTCWHIECFD/


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/I6N6JLKNECFFP5TRSYYAQWLTUTLJKAF7/


[ovirt-users] Re: Problem with Cluster-wise BIOS Settings in oVirt 4.4

2020-10-01 Thread thomas
Que yo sepa, no existe alternativa al reinstalarlo... La maquina de 
manejamiento hereda su configuracion del cluster y en este caso, cuando lo 
tengas cambiado, le falta la hardware virtualizada para arrancar, y como no 
tienes  GUI tampoco lo puedes cambiar... ¡No eres el primero quien haya caido 
en esa trampa! A mí me lo occurió igual...

I had the same issue and I find it much to easy to fall into.

In my case since because on 4.4 the cluster default is on Q35, some of my older 
FX440 based VMs failed to work, because Ethernet devices got renamed on Q35 
"hardware". So I went to change the default config on the cluster to not 
enforce the Q35 base and then ran into the new management engine failing to 
start, because that didn't like the FX440 base hardware it inherited from the 
cluster, even if it had been running as a Q35 machine after installation and 
should perhaps have retained that.

Since oVirt seems to re-synthesize virtual machine hardware on every startup, 
the rules on how the machines are re-constituted perhaps need to be better 
described and controlled, especially in these migration scenarios.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PPDKVDVBALNHZT5MVFMOLDCDBLFBIYZM/


[ovirt-users] Re: Latest ManagedBlockDevice documentation

2020-09-30 Thread Michael Thomas
I hadn't installed the necessary packages when the engine was first 
installed.


However, running 'engine-setup --reconfigure-optional-components' 
doesn't work at the moment because (by design) my engine does not have a 
network route outside of the cluster.  It fails with:


[ INFO  ] DNF Errors during downloading metadata for repository 'AppStream':
   - Curl error (7): Couldn't connect to server for 
http://mirrorlist.centos.org/?release=8=x86_64=AppStream=$infra 
[Failed to connect to mirrorlist.centos.org port 80: Network is unreachable]
[ ERROR ] DNF Failed to download metadata for repo 'AppStream': Cannot 
prepare internal mirrorlist: Curl error (7): Couldn't connect to server 
for 
http://mirrorlist.centos.org/?release=8=x86_64=AppStream=$infra 
[Failed to connect to mirrorlist.centos.org port 80: Network is unreachable]



I have a proxy set in the engine's /etc/dnf/dnf.conf, but it doesn't 
seem to be obeyed when running engine-setup.  Is there another way that 
I can get engine-setup to use a proxy?


--Mike


On 9/30/20 2:19 AM, Benny Zlotnik wrote:

When you ran `engine-setup` did you enable cinderlib preview (it will
not be enabled by default)?
It should handle the creation of the database automatically, if you
didn't you can enable it by running:
`engine-setup --reconfigure-optional-components`


On Wed, Sep 30, 2020 at 1:58 AM Michael Thomas  wrote:


Hi Benny,

Thanks for the confirmation.  I've installed openstack-ussuri and ceph
Octopus.  Then I tried using these instructions, as well as the deep
dive that Eyal has posted at https://www.youtube.com/watch?v=F3JttBkjsX8.

I've done this a couple of times, and each time the engine fails when I
try to add the new managed block storage domain.  The error on the
screen indicates that it can't connect to the cinder database.  The
error in the engine log is:

2020-09-29 17:02:11,859-05 WARN
[org.ovirt.engine.core.bll.storage.domain.AddManagedBlockStorageDomainCommand]
(default task-2) [d519088c-7956-4078-b5cf-156e5b3f1e59] Validation of
action 'AddManagedBlockStorageDomain' failed for user
admin@internal-authz. Reasons:
VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__ADD,ACTION_TYPE_FAILED_CINDERLIB_DATA_BASE_REQUIRED,ACTION_TYPE_FAILED_CINDERLIB_DATA_BASE_REQUIRED

I had created the db on the engine with this command:

su - postgres -c "psql -d template1 -c \"create database cinder owner
engine template template0 encoding 'UTF8' lc_collate 'en_US.UTF-8'
lc_ctype 'en_US.UTF-8';\""

...and added the following to the end of /var/lib/pgsql/data/pg_hba.conf:

  hostcinder  engine  ::0/0   md5
  hostcinder  engine  0.0.0.0/0   md5

Is there anywhere else I should look to find out what may have gone wrong?

--Mike

On 9/29/20 3:34 PM, Benny Zlotnik wrote:

The feature is currently in tech preview, but it's being worked on.
The feature page is outdated,  but I believe this is what most users
in the mailing list were using. We held off on updating it because the
installation instructions have been a moving target, but it is more
stable now and I will update it soon.

Specifically speaking, the openstack version should be updated to
train (it is likely ussuri works fine too, but I haven't tried it) and
cinderlib has an RPM now (python3-cinderlib)[1], so it can be
installed instead of using pip, same goes for os-brick. The rest of
the information is valid.


[1] http://mirror.centos.org/centos/8/cloud/x86_64/openstack-ussuri/Packages/p/

On Tue, Sep 29, 2020 at 10:37 PM Michael Thomas  wrote:


I'm looking for the latest documentation for setting up a Managed Block
Device storage domain so that I can move some of my VM images to ceph rbd.

I found this:

https://ovirt.org/develop/release-management/features/storage/cinderlib-integration.html

...but it has a big note at the top that it is "...not user
documentation and should not be treated as such."

The oVirt administration guide[1] does not talk about managed block devices.

I've found a few mailing list threads that discuss people setting up a
Managed Block Device with ceph, but didn't see any links to
documentation steps that folks were following.

Is the Managed Block Storage domain a supported feature in oVirt 4.4.2,
and if so, where is the documentation for using it?

--Mike
[1]ovirt.org/documentation/administration_guide/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KHCLXVOCELHOR3G7SH3GDPGRKITCW7UY/







___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www

[ovirt-users] Re: Latest ManagedBlockDevice documentation

2020-09-29 Thread Michael Thomas

Hi Benny,

Thanks for the confirmation.  I've installed openstack-ussuri and ceph 
Octopus.  Then I tried using these instructions, as well as the deep 
dive that Eyal has posted at https://www.youtube.com/watch?v=F3JttBkjsX8.


I've done this a couple of times, and each time the engine fails when I 
try to add the new managed block storage domain.  The error on the 
screen indicates that it can't connect to the cinder database.  The 
error in the engine log is:


2020-09-29 17:02:11,859-05 WARN 
[org.ovirt.engine.core.bll.storage.domain.AddManagedBlockStorageDomainCommand] 
(default task-2) [d519088c-7956-4078-b5cf-156e5b3f1e59] Validation of 
action 'AddManagedBlockStorageDomain' failed for user 
admin@internal-authz. Reasons: 
VAR__TYPE__STORAGE__DOMAIN,VAR__ACTION__ADD,ACTION_TYPE_FAILED_CINDERLIB_DATA_BASE_REQUIRED,ACTION_TYPE_FAILED_CINDERLIB_DATA_BASE_REQUIRED


I had created the db on the engine with this command:

su - postgres -c "psql -d template1 -c \"create database cinder owner 
engine template template0 encoding 'UTF8' lc_collate 'en_US.UTF-8' 
lc_ctype 'en_US.UTF-8';\""


...and added the following to the end of /var/lib/pgsql/data/pg_hba.conf:

hostcinder  engine  ::0/0   md5
hostcinder  engine  0.0.0.0/0   md5

Is there anywhere else I should look to find out what may have gone wrong?

--Mike

On 9/29/20 3:34 PM, Benny Zlotnik wrote:

The feature is currently in tech preview, but it's being worked on.
The feature page is outdated,  but I believe this is what most users
in the mailing list were using. We held off on updating it because the
installation instructions have been a moving target, but it is more
stable now and I will update it soon.

Specifically speaking, the openstack version should be updated to
train (it is likely ussuri works fine too, but I haven't tried it) and
cinderlib has an RPM now (python3-cinderlib)[1], so it can be
installed instead of using pip, same goes for os-brick. The rest of
the information is valid.


[1] http://mirror.centos.org/centos/8/cloud/x86_64/openstack-ussuri/Packages/p/

On Tue, Sep 29, 2020 at 10:37 PM Michael Thomas  wrote:


I'm looking for the latest documentation for setting up a Managed Block
Device storage domain so that I can move some of my VM images to ceph rbd.

I found this:

https://ovirt.org/develop/release-management/features/storage/cinderlib-integration.html

...but it has a big note at the top that it is "...not user
documentation and should not be treated as such."

The oVirt administration guide[1] does not talk about managed block devices.

I've found a few mailing list threads that discuss people setting up a
Managed Block Device with ceph, but didn't see any links to
documentation steps that folks were following.

Is the Managed Block Storage domain a supported feature in oVirt 4.4.2,
and if so, where is the documentation for using it?

--Mike
[1]ovirt.org/documentation/administration_guide/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KHCLXVOCELHOR3G7SH3GDPGRKITCW7UY/



___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WUYG5H2T4ODBS3YCOTNHJMUBCKMFMATI/


[ovirt-users] Latest ManagedBlockDevice documentation

2020-09-29 Thread Michael Thomas
I'm looking for the latest documentation for setting up a Managed Block 
Device storage domain so that I can move some of my VM images to ceph rbd.


I found this:

https://ovirt.org/develop/release-management/features/storage/cinderlib-integration.html

...but it has a big note at the top that it is "...not user 
documentation and should not be treated as such."


The oVirt administration guide[1] does not talk about managed block devices.

I've found a few mailing list threads that discuss people setting up a 
Managed Block Device with ceph, but didn't see any links to 
documentation steps that folks were following.


Is the Managed Block Storage domain a supported feature in oVirt 4.4.2, 
and if so, where is the documentation for using it?


--Mike
[1]ovirt.org/documentation/administration_guide/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KHCLXVOCELHOR3G7SH3GDPGRKITCW7UY/


[ovirt-users] Single Node HCI upgrade procedure from CentOS7/oVirt 4.3 to CentOS8/oVirt 4.4?

2020-09-26 Thread thomas
I can hear you saying: "You did understand that single node HCI is just a toy, 
right?"

For me the primary use of a single node HCI is adding some disaster resilience 
in small server edge type scenarios, where a three node HCI provides the fault 
tolerance: 3+1 with a bit of distance, warm or even cold stand-by, potentially 
manual switch and reduced workload in case disaster strikes.

Of course, another 3nHCI would be better, but who gets that type of budget, 
right?

What I am trying say: If you want oVirt to gain market share, try to give HCI 
more love. And while you're at it, try to make expanding from 1nHCI to 3nHCI 
(and higher counts) a standard operational procedure to allow expanding a 
disaster stand-by into a production setup, while the original 3nHCI is being 
rebuilt.

For me low-budget HCI is where oVirt has its biggest competitive advantage 
against vSan and Nutanix, so please don't treat the HCI/gluster variant like an 
unwanted child any more.

In the mean-time OVA imports (from 4.3.10 exports) on my 4.4.2 1nHCI fail 
again, which I'll report separately.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QI3Z45SRJD72ZJIX6HZCVC7DVVSZCKUW/


[ovirt-users] Re: Node upgrade to 4.4

2020-09-26 Thread thomas
I was looking forward to that presentation for exactly that reason: But it 
completely bypassed the HCI scenario, was very light on details and of course 
assumed that everything would just work, because there is no easy fail-back and 
you're probably better off taking down the complete farm during the upgrade. If 
you are actually depending on that farm to operate,...

In the mean-time I try to make sure I have a plan-B and C solution in case the 
official procedure fails, but have *.ova (plan-C) imports fail (again) in 
extract_ova.py on the single-node 4.4.2 HCI cluster that I have set up as 
target to do migration testing... a bug far to familiar for a newest release.

I'll be testing re-attachable NFS domains next (plan-B), but for me *.ova 
exports and imports are still the most fundmental operations any hypervisor 
needs to support: If OVA is the wrong file format, fine, just do another one. 
But IMHO turning a machine into a file and a file into a machine is the very 
definition of what hypervisors do. And a release that doesn't seem to test this 
operation, does not rhyme with "solution for your entire enterprise" on the 
oVirt home page.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZLPQWUQ2UPTGKXICQNZUHAT45TKQC5OO/


[ovirt-users] Re: Node upgrade to 4.4

2020-09-24 Thread thomas
I am hoping for a miracle like that, too.

In the mean-time I am trying to make sure that all variants of exports and 
imports from *.ova to re-attachable NFS domains work properly, in case I have 
to start from scratch.

HCI upgrades don't get the special love you'd expect after RHV's proud 
announcement that they are now ready to take on Nutanix and vSAN.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/C2HVZDUABWKNFN4IJD2ILLQF5E2DUUBU/


[ovirt-users] Re: Node upgrade to 4.4

2020-09-23 Thread Michael Thomas
Not to give you any false hope, but when I recently reinstalled my oVirt 
4.4.2 cluster, I left the gluster disks alone and only reformatted the 
OS disks.  Much to my surprise, after running the oVirt HCI wizard on 
this new installation (using the exact same gluster settings as before), 
the original contents of my gluster-based data domain were still intact.


I certainly wouldn't count on this behavior with any important data, though.

--Mike

On 9/23/20 1:40 PM, Vincent Royer wrote:

well that sounds like a risky nightmare. I appreciate your help.

*Vincent Royer*
*778-825-1057*



*SUSTAINABLE MOBILE ENERGY SOLUTIONS*





On Wed, Sep 23, 2020 at 11:31 AM Strahil Nikolov 
wrote:


Before you reinstall the node , you should use 'gluster volume
remove-brick  replica  ovirt_node:/path-to-brick' to
reduce the volume to replica 2 (for example). Then you need to 'gluster
peer detach ovirt_node' in order to fully cleanup the gluster TSP.

You will have to remove the bricks that are on that < ovirt_node > before
detaching it.

Once you reinstall with EL 8, you can 'gluster peer probe
' and then 'gluster volume add-brick  replica
 reinstalled_ovirt_node:/path-to-brick.

Note that reusing bricks is not very easy, so just wipe the data via
'mkfs.xfs -i size=512 /dev/block/device'.

Once all volumes are again a replica 3 , just wait for the healing to go
over and you can proceed with the oVirt part.

Best Regards,
Strahil Nikolov






В сряда, 23 септември 2020 г., 20:45:30 Гринуич+3, Vincent Royer <
vinc...@epicenergy.ca> написа:





My confusion is that those documents do not describe any gluster related
tasks for Ovirt Nodes.  When I take a node down and install Ovirt Node 4.4
on it, won't all the gluster bricks on that node be lost?  The part
describing "preserving local storage", that isn't anything about Gluster,
correct?


Vincent Royer
778-825-1057


SUSTAINABLE MOBILE ENERGY SOLUTIONS





On Tue, Sep 22, 2020 at 8:31 PM Ritesh Chikatwar 
wrote:

Vincent,


This document will be useful


https://www.ovirt.org/documentation/upgrade_guide/#Upgrading_the_Manager_to_4-4_4-3_SHE


On Wed, Sep 23, 2020, 3:55 AM Vincent Royer 

wrote:

I have 3 nodes running node ng 4.3.9 with a gluster/hci cluster.  How

do I upgrade to 4.4?  Is there a guide?

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:

https://www.ovirt.org/community/about/community-guidelines/

List Archives:

https://lists.ovirt.org/archives/list/users@ovirt.org/message/TCX2RUE5RN7RNB45UWBXZ4SKH6KT7ZFC/





___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/J6IERH7OAO6JJ423A3K2KU2R25YXU2NF/




___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NXLHNX2ABBGAAJZXVRDJODX3H2WF7BGR/


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FVZMDQQPQ3PIEHSSDTF52VY5U7337RUM/


[ovirt-users] Posts not updating on lists.ovirt.org web site since September 10th?

2020-09-15 Thread thomas
I can see new posts coming in via e-mail, but updates on the web sites have 
stopped and posts don't disappear?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] Re: Gluster quorum issue on 3-node HCI with extra 5-nodes as compute and storage nodes

2020-09-14 Thread Thomas Hoberg

Am 14.09.2020 um 15:23 schrieb tho...@hoberg.net:

Sorry two times now:
1. It is a duplicate post, because the delay for posts to show up on the 
web site is ever longer (as I am responding via mail, the first post is 
still not shown...)


2. It seems to have been a wild goose chase: The gluster daemon from 
group B did eventually regain quorum (or returned to its senses) some 
time later... the error message is pretty scary and IMHO somewhat 
misleading, but...


With oVirt one must learn to be patient, evidently all that self-healing 
built-in depends on state machines turning their cogs and gears, not on 
admins pushing for things to happen... sorry!

Yes, I've also posted this on the Gluster Slack. But I am using Gluster mostly 
because it's part of oVirt HCI, so don't just send me away, please!

Problem: GlusterD refusing to start due to quorum issues for volumes where it 
isn’t contributing any brick

(I've had this before on a different farm, but there it was transitory. Now I 
have it in a more observable manner, that's why I open a new topic)

In a test farm with recycled servers, I started running Gluster via oVirt 
3node-HCI, because I got 3 machines originally.
They were set up as group A in a 2:1 (replica:arbiter) oVirt HCI setup with 
'engine', 'vmstore' and 'data' volumes, one brick on each node.

I then got another five machines with hardware specs that were rather different 
to group A, so I set those up as group B to mostly act as compute nodes, but 
also to provide extra storage, mostly to be used externally as GlusterFS 
shares. It took a bit of fiddling with Ansible but I got these 5 nodes to serve 
two more Gluster volumes 'tape' and 'scratch' using dispersed bricks (4 
disperse:1 redundancy), RAID5 in my mind.

The two groups are in one Gluster, not because they serve bricks to the same 
volumes, but because oVirt doesn't like nodes to be in different Glusters (or 
actually, to already be in a Gluster when you add them as host node). But the 
two groups provide bricks to distinct volumes, there is no overlap.

After setup things have been running fine for weeks, but now I needed to 
restart a machine from group B, which has ‘tape’ and ‘scratch’ bricks, but none 
from original oVirt ‘engine’, ‘vmstore’ and ‘data’ in group A. Yet the gluster 
daemon refuses to start, citing a loss of quorum for these three volumes, even 
if it has no bricks in them… which makes no sense to me.

I am afraid the source of the issue is concept issues: I clearly don't really 
understand some design assumptions of Gluster.
And I'm afraid the design assumptions of Gluster and of oVirt (even with HCI), 
are not as related as one might assume from the marketing materials on the 
oVirt home-page.

But most of all I'd like to know: How do I fix this now?

I can't heal 'tape' and 'scratch', which are growing ever more apart while the 
glusterd on this machine in group B refuses to come online for lack of a quorum 
on volumes where it is not contributing bricks.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:



<>___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users]oVirt HCI issue: GlusterD refusing to start due to quorum issues for volumes where it isn’t contributing any brick

2020-09-14 Thread thomas
Sorry if it's a duplicate: I go an error on post... And yes I posted it on the 
Gluster slack first, but I am using Gluster only because the marketing on oVirt 
HCI worked so well...

I got 3 recycled servers for an oVirt test environment first and set those up 
as 3-node HCI using defaults mostly, 2 replica + 1 arbiter, 'engine', 'vmstore' 
and 'data' volumes with single bricks for each node. I call these group A.

Then I got another set of five machines, let's call them group B, with somewhat 
different hardware characteristics than group A, but nicely similar between 
themselves. I wanted to add these to the farm as compute nodes but also use 
their storage as general GlusterFS storage for a wider use.

Group B machines were added as hosts and set up to run hosted-engine, but they 
do not contribute bricks to the normal oVirt volumes 'engine', 'vmstore' or 
'data'. With some Ansible trickery I managed to set up two dispersed volumes (4 
data: 1 redundancy) on group B 'scratch' and 'tape', mostly for external 
GlusterFS use. oVirt picked them up automagically, so I guess they could also 
be used with VMs.

I expect to get more machines and adding them one-by-one to dispersed volumes 
with a fine balance between capacity and redundancy made me so enthusiastic 
about oVirt HCI in the first place...

After some weeks of fine operation I had to restart a machine from group B for 
maintenance. When it came back up, GlusterD refuses to come online, because it 
doesn't have "quorum for volumes 'engine', 'vmstore' and 'data'"

It's a small surprise it doesn't *have* quorum, what's a bigger surprise is 
that it *asks* for quorum in a volume where it's not contributing any bricks. 
What's worse is that it then refuses to start serving its bricks for 'scratch' 
and 'tape', which are now growing apart without any chance of healing.

How do I fix this?

Is this a bug (my interpretation) or do I fundamentlly misunderstand how 
Gluster as a hyper scale out file system is supposed to work with potentially 
thousands of hosts contributing each dozens of bricks to each of hundreds of 
volumes in a single name space?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] Gluster quorum issue on 3-node HCI with extra 5-nodes as compute and storage nodes

2020-09-14 Thread thomas
Yes, I've also posted this on the Gluster Slack. But I am using Gluster mostly 
because it's part of oVirt HCI, so don't just send me away, please!

Problem: GlusterD refusing to start due to quorum issues for volumes where it 
isn’t contributing any brick

(I've had this before on a different farm, but there it was transitory. Now I 
have it in a more observable manner, that's why I open a new topic)

In a test farm with recycled servers, I started running Gluster via oVirt 
3node-HCI, because I got 3 machines originally.
They were set up as group A in a 2:1 (replica:arbiter) oVirt HCI setup with 
'engine', 'vmstore' and 'data' volumes, one brick on each node.

I then got another five machines with hardware specs that were rather different 
to group A, so I set those up as group B to mostly act as compute nodes, but 
also to provide extra storage, mostly to be used externally as GlusterFS 
shares. It took a bit of fiddling with Ansible but I got these 5 nodes to serve 
two more Gluster volumes 'tape' and 'scratch' using dispersed bricks (4 
disperse:1 redundancy), RAID5 in my mind.

The two groups are in one Gluster, not because they serve bricks to the same 
volumes, but because oVirt doesn't like nodes to be in different Glusters (or 
actually, to already be in a Gluster when you add them as host node). But the 
two groups provide bricks to distinct volumes, there is no overlap.

After setup things have been running fine for weeks, but now I needed to 
restart a machine from group B, which has ‘tape’ and ‘scratch’ bricks, but none 
from original oVirt ‘engine’, ‘vmstore’ and ‘data’ in group A. Yet the gluster 
daemon refuses to start, citing a loss of quorum for these three volumes, even 
if it has no bricks in them… which makes no sense to me.

I am afraid the source of the issue is concept issues: I clearly don't really 
understand some design assumptions of Gluster.
And I'm afraid the design assumptions of Gluster and of oVirt (even with HCI), 
are not as related as one might assume from the marketing materials on the 
oVirt home-page.

But most of all I'd like to know: How do I fix this now?

I can't heal 'tape' and 'scratch', which are growing ever more apart while the 
glusterd on this machine in group B refuses to come online for lack of a quorum 
on volumes where it is not contributing bricks.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:


[ovirt-users] Re: How to Backup a VM

2020-09-03 Thread thomas
> On Sun, Aug 30, 2020 at 7:13 PM  
> Using export domain is not a single click, but it is not that complicated.
> But this is good feedback anyway.
> 
> 
> I think the issue is gluster, not qemu-img.
> 
> 
> How did you try? transfer via the UI is completely different than
> transfer using the python API.
> 
> From the UI, you get the image content on storage, without sparseness
> support. If you
> download 500g raw sparse disk (e.g. gluster with allocation policy
> thin) with 50g of data
> and 450g of unallocated space, you will get 50g of data, and 450g of
> zeroes. This is very
> slow. If you upload the image to another system you will upload 500g
> of data, which will
> again be very slow.
> 
> From the python API, download and upload support sparseness, so you
> will download and
> upload only 50g. Both upload and download use 4 connections, so you
> can maximize the
> throughput that you can get from the storage. From python API, you can
> convert the image
> format during download/upload automatically, for example download raw
> disk to qcow2
> image.
> 
> Gluster is a challenge (as usual), since when using sharding (enabled
> by default for ovirt),
> it does not report sparness. So even from the python API you will
> download the entire 500g.
> We can improve this using zero detection but this is not implemented yet.
> 
> 
> In our lab we tested upload of 100 GiB image and 10 concurrent uploads
> of 100 GiB
> images, and we measured throughput of 1 GiB/s:
> https://bugzilla.redhat.com/show_bug.cgi?id=1591439#c24
> 
> I would like to understand the setup better:
> 
> - upload or download?
> - disk format?
> - disk storage?
> - how is storage connected to host?
> - how do you access the host (1g network? 10g?)
> - image format?
> - image storage?
> 
> 
> backup domain is a partly cooked feature and it is not very useful.
> There is no reason
> to use it for moving VMs from one environment to another.
> 
> I already explained how to move vms using a data domain. Check here:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ULLFLFKBAW7...
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/GFOK55O5N4S...
> 
> I'm not sure it is documented properly, please file a documentation
> bug if we need to
> add something to the documentation.
> 
> 
> If you cloned a vm to data domain and then detach the data domain
> there is nothing to cleanup in the source system.
> 
> 
> We have this in 4.4, try to select a VM and click "Export".
> 
> Nir
> On Sun, Aug 30, 2020 at 7:13 PM  
> Using export domain is not a single click, but it is not that complicated.
> But this is good feedback anyway.
> 
> 
> I think the issue is gluster, not qemu-img.

From what I am gathering from your feedback, that may be very much so, and I 
think it's a  major concern.

I know RHV started out much like vSphere or Oracle Virtualization without HCI, 
but with separated storage and dedicated servers for the management. If you 
have scale, HCI is quite simply inefficient.

But if you have scale, you either are already cloud yourself or going there. So 
IMHO HCI in small lab, edge, industrial or embedded applications is *the* 
future for HCI products and with it for oVirt. In that sense I perfectly 
subscribe to your perspective that the 'Python-GUI' is the major selling point 
of oVirt towards developers, but where Ceph, NAS and SAN will most likely be 
managed professionally, the HCI stuff needs to work out of your box--perfectly.

In my case I am lego-ing surplus servers into an HCI to use both as resilient 
storage and for POC VMs which are fire and forget (a host goes down, the VMs 
get restarted elsewhere, no need to rush in and rewire things if an old host 
had it's final gasp).

The target model at the edge I see is more what I have at home in my home-lab, 
which is basically a bunch of NUCs, Atom J5005 with 32GB and 1TB SATA at the 
low end, and now with 14nm Core CPUs being pushed out of inventories for cheap, 
even a NUC10 i7-10710U with 64GB of RAM and 1TB of NVMe, a fault tolerant 
cluster well below 50Watts in normal operations and with no moving parts.

In the corporate lab these are complemented by big ML servers for the main 
research, where the oVirt HCI simply adds storage and VMs for automation jobs, 
but I'd love to be able to use those also as oVirt compute nodes, at least 
partially: The main workloads there run under Docker because of the easy GPU 
integration. It's not that dissimilar in the home-lab, where my workstations 
(not 24/7 and often running Windows) may sometimes be added as compute nodes, 
but not part of the HCI parts.

I'd love to string these all together via a USB3 Gluster and use the on-board 
1Gbit for the business end of the VMS, but since nobody offers a simple USB3 
peering network, I am using 2.5 or 5GBit USB Ethernet adapters instead for 
3-node HCI (main) and 1-node HCI (disaster/backup/migration).
> 
> 
> How did you try? transfer via the UI is completely different 

[ovirt-users] CLI for HCI setup

2020-09-02 Thread Michael Thomas
Is there a CLI for setting up a hyperconverged environment with 
glusterfs?  The docs that I've found detail how to do it using the 
cockpit interface[1], but I'd prefer to use a cli similar to 
'hosted-engine --deploy' if it is available.


Thanks,

--Mike
[1]https://www.ovirt.org/documentation/gluster-hyperconverged/chap-Deploying_Hyperconverged.html
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KBSWDQXASO7PT5ZTWCH34DXSPJAQ3DMO/


[ovirt-users] Re: How can Gluster be a HCI default, when it's hardly ever working?

2020-09-02 Thread thomas
In this specific case Ieven used virgin hardware originally.

Once I managed to kill the hosted-engine by downgrading the datacenter cluster 
to legacy, I re-installed all gluster storage from the VDO level up. No traces 
of a file system should be left with LVM and XFS on top, even if I didn't 
actually null the SSD (does writing nulls to an SSD actually cost you an 
overwrite these days or is that translated into a trim by the firmware?)

No difference in terms of faults between the virgin hardware and the 
re-install, so stale Gluster extended file attributes etc. (your error theory, 
I believe) is not a factor.

Choosing between 'vmstore' and 'data' domains for the imports makes no 
difference, full allocation over thin allocation neither. But actually I didn't 
just see write errors from qemu-img, but also read-errors, which had me 
concerned about some other corruption source. That was another motivation to 
start with a fresh source, which meant a backup-domain instead of an export 
domain or OVAs.

The storage underneath the backup domain is NFS (Posix has a 4k issue and I'm 
not sure I want to try moving Glusters between farms just yet), which is easy 
to detach at the source and import at the target. If NFS is your default, oVirt 
can be so much easier, but that more 'professional' domain we use vSphere and 
actually SAN storage. The attraction of oVirt for the lab use case, critically 
depends on HCI and gluster.

The VMs were fine running from the backup domain (which incidentally must have 
lost its backup attribute at the target, because otherwise it should have kept 
the VMs from launching...), but once I tried moving their disks to the gluster, 
I got empty or unusable disks again, or error while moving.

The only way that I found to transfer gluster to gluster was to use disk 
uploads either via the GUI or by Python, but that results into fully allocated 
images and is very slow at 50MB/s even with Python. BTW sparsifying does 
nothing to those images, I guess because sectors full of nulls aren't actually 
the same as a logically unused sector. At least the VDO underneath should take 
reduce some of the overhead.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TTQE7YLN5JKABRGSNOFTV3FMMZNO2DRC/


[ovirt-users] How can Gluster be a HCI default, when it's hardly ever working?

2020-08-31 Thread thomas
I've just tried to verify what you said here.

As a base line I started with the 1nHCI Gluster setup. From four VMs, two 
legacy, two Q35 on the single node Gluster, one survived the import, one failed 
silently with an empty disk, two failed somewhere in the middle of qemu-img 
trying to write the image to the Gluster storage. For each of those two, this 
always happened at the same block number, a unique one per machine, not in 
random places, as if qemu-img reading and writing the very same image could not 
agree. That's two types of error and a 75% failure rate

I created another domain, basically using an NFS automount export from one of 
the HCI nodes (a 4.3 node serving as 4.4 storage) and imported the very same 
VMs (source all 4.3) transported via a re-attached export domain to 4.4. Three 
of the for imports worked fine, no error with qemu-img writing to NFS. All VMs 
had full disk images and launched, which verified that there is nothing wrong 
with the exports at least.

But there was still one, that failed with the same qemu-img error.

I then tried to move the disks from NFS to Gluster, which internally is also 
done via qemu-img, and I had those fail every time.

Gluster or HCI seems a bit of a Russian roulette for migrations, and I am 
wondering how much it is better for normal operations.

I'm still going to try moving via a backup domain (on NFS) and moving between 
that and Gluster, to see if it makes any difference.

I really haven't done a lot of stress testing yet with oVirt, but this 
experience doesn't build confidence.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XM6YYH5H455EPGA33MYDLHYY2J3N35UT/


[ovirt-users] Re: How can you avoid breaking 4.3.11 legacy VMs imported in 4.4.1 during a migration?

2020-08-31 Thread thomas
Thanks for the suggestion, I tried that, but I didn't get very far on a single 
node HCI cluster...
And I'm afraid it won't be much better on HCI in general, which is really the 
use case I am most interested in.

Silently converting VMs is something rather unexpected from a hypervisor, doing 
it twice my result in the same machine here only by accident.

That type of design decision needs highlighting in the documentation, because 
users just won't be expecting it.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MUGBA2TBEEIHIJJLBYVQTYQKUTOJ2MWK/


[ovirt-users] Re: Promiscuous Mode

2020-08-31 Thread thomas
Not sure it will actually do that, but if you create a new network profile, you 
can select a 'port mirroring' option: It is my understanding that is what you 
need. You may also want to deselect the filtering rules in the same place.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PT6T44HFM5GXYIQPUDHIQMK5IRITL7OM/


[ovirt-users] Re: can't mount an export domain with 4K block size (VDO) as PosixFS (xfs)

2020-08-31 Thread thomas
I like playing with shiny new toys and VDO is one :-)

Actually with that it shouldn't matter if whatever is put on top actually has 
spare allocation in the file format or just blocks with zeros, which makes for 
a reusable HDD in case I am working with imageio images that are raw.

Good to know that using NFS directly won't cost extra performance, because 
that's how I got around the 4K issue.

Alas, plenty of qemu-img imports still fail with the ENOENT error on the 3nHCI 
gluster. From what I am reading between the lines, oVirt would be rock solid, 
if somebody didn't want to compete with Nutanix and had put Gluster 
underneath...

With a 25% success rate at trying to transfer VMs from 4.3 to 4.4 I am... in no 
position to migrate, really.

Trying the backup domains next, but their documentation is pretty awful if 
non-existent.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Y4CHR436ZNE2S6PACFLXQY32E4ASFE5I/


[ovirt-users] Re: How can you avoid breaking 4.3.11 legacy VMs imported in 4.4.1 during a migration?

2020-08-31 Thread thomas
On a machine that survived the import, that worked as expected.

May want to add that to a check list, that the original machine type for legacy 
isn't carried over after an import but needs to be set explicitly.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AB55P6ATS2NLSOMAU2YCOTDL5TDOVHR4/


[ovirt-users] Re: How can you avoid breaking 4.3.11 legacy VMs imported in 4.4.1 during a migration?

2020-08-31 Thread thomas
I might have found one: You can set the emulated machine to 'legacy' before the 
first launch.

No idea yet, if it will actually work, because the boot disk was copied empty...
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RXIEQC5URNAZ3VIYOW27FOZCV3AFE23W/


[ovirt-users] How can you avoid breaking 4.3.11 legacy VMs imported in 4.4.1 during a migration?

2020-08-31 Thread thomas
Testing the 4.3 to 4.4 migration... what I describe here is facts is mostly 
observations and conjecture, could be wrong, just makes writing easier...

While 4.3 seems to maintain a default emulated machine type 
(pc-i440fx-rhel7.6.0 by default), it doesn't actually allow setting it in the 
cluster settings: Could be built-in, could be inherited from the default 
template... Most of my VMs were created with the default on 4.3.

oVirt 4.4 presets that to pc-q35-rhel8.1.0 and that has implications:
1. Any VM imported from an export on a 4.3 farm, will get upgraded to Q35, 
which unfortunately breaks things, e.g. network adapters getting renamed as the 
first issue I stumbled on some Debian machines
2. If you try to compensate by lowering the cluster default from Q35 to 
pc-i44fx the hosted-engine will fail, because it was either built or came as 
Q35 and can no longer find critical devices: It evidently doesn't take/use the 
VM configuration data it had at the last shutdown, but seems to re-generate it 
according to some obscure logic, which fails here.

I've tried creating a bit of backward compatibility by creating another 
template based on pc-i440fx, but at the time of the import, I cannot switch the 
template.
If I try to downgrade the cluster, the hosted-engine will fail to start and I 
can't change the template of the hosted-engine to something Q35.

Currently this leaves me in a position where I can't separate the move of VMs 
from 4.3 to 4.4 and the upgrade of the virtual hardware, which is a different 
mess for every OS in the mix of VMs.

Recommendations, tips anyone?

P.S. A hypervisor reconstructing the virtual hardware from anywhere but storage 
at every launch, is difficult to trust IMHO.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/36WNCP6YMRM3MG44WIVHLVOUD2MACDQ5/


[ovirt-users] can't mount an export domain with 4K block size (VDO) as PosixFS (xfs)

2020-08-31 Thread thomas
After this 
(https://devconfcz2020a.sched.com/event/YOtG/ovirt-4k-teaching-an-old-dog-new-tricks)
 

I sure do not expect this (log below):

Actually I am trying to evaluate just how portable oVirt storage is and this 
case I had prepared a USB3 HDD with VDO, which I could literally move between 
farms to transport VMs.
Logical disks are typically large for simplicity within the VMs, QCOW2 and VDO 
assumed to compensate for this 'lack of planning' while the allocated storage 
easily fits the HDD.

Once I got beyond the initial issues, I came across this somewhat unexpected 
issue: VDO storage uses 4k blocks all around, but evidently when you mount an 
export domain (or I guess any domain) as POSIX, 512byte blocks are assumed 
somewhere and 4k blocks rejected.

I'd say that is a bug in VDSM, right?
Or is there anything in the mount options to fix this?

020-08-31 18:44:40,424+0200 INFO  (periodic/0) [vdsm.api] START 
repoStats(domains=()) from=internal, 
task_id=7a293dec-85b3-4b82-92b7-4e7d03b40343 (api:48)
2020-08-31 18:44:40,425+0200 INFO  (periodic/0) [vdsm.api] FINISH repoStats 
return={'9992dc21-edf2-4951-9020-7c78f1220e02': {'code': 0, 'lastCheck': '1.6', 
'delay': '0.00283651', 'valid': True, 'version': 5, 'acquired': True, 'actual': 
True}, '25d95783-44df-4fda-b642-55fe09162149': {'code': 0, 'lastCheck': '2.2', 
'delay': '0.00256944', 'valid': True, 'version': 5, 'acquired': True, 'actual': 
True}, '148d9e9e-d7b8-4220-9ee8-057da96e608c': {'code': 0, 'lastCheck': '2.2', 
'delay': '0.00050217', 'valid': True, 'version': 5, 'acquired': True, 'actual': 
True}} from=internal, task_id=7a293dec-85b3-4b82-92b7-4e7d03b40343 (api:54)
2020-08-31 18:44:41,121+0200 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call 
Host.ping2 succeeded in 0.00 seconds (__init__:312)
2020-08-31 18:44:41,412+0200 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call 
Host.ping2 succeeded in 0.00 seconds (__init__:312)
2020-08-31 18:44:41,414+0200 INFO  (jsonrpc/2) [vdsm.api] START 
repoStats(domains=['9992dc21-edf2-4951-9020-7c78f1220e02']) from=::1,44782, 
task_id=657c820e-e2eb-4b1d-b2fb-1772b3af6f32 (api:48)
2020-08-31 18:44:41,414+0200 INFO  (jsonrpc/2) [vdsm.api] FINISH repoStats 
return={'9992dc21-edf2-4951-9020-7c78f1220e02': {'code': 0, 'lastCheck': '2.6', 
'delay': '0.00283651', 'valid': True, 'version': 5, 'acquired': True, 'actual': 
True}} from=::1,44782, task_id=657c820e-e2eb-4b1d-b2fb-1772b3af6f32 (api:54)
2020-08-31 18:44:41,414+0200 INFO  (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call 
Host.getStorageRepoStats succeeded in 0.00 seconds (__init__:312)
2020-08-31 18:44:41,965+0200 INFO  (jsonrpc/3) [IOProcessClient] (Global) 
Starting client (__init__:308)
2020-08-31 18:44:41,984+0200 INFO  (ioprocess/87332) [IOProcess] (Global) 
Starting ioprocess (__init__:434)
2020-08-31 18:44:42,006+0200 INFO  (jsonrpc/3) [storage.StorageDomainCache] 
Removing domain fe9fb0db-2743-457a-80f0-9a4edc509e9d from storage domain cache 
(sdc:211)
2020-08-31 18:44:42,006+0200 INFO  (jsonrpc/3) [storage.StorageDomainCache] 
Invalidating storage domain cache (sdc:74)
2020-08-31 18:44:42,006+0200 INFO  (jsonrpc/3) [vdsm.api] FINISH 
connectStorageServer return={'statuslist': [{'id': 
'----', 'status': 0}]} 
from=:::192.168.0.87,40378, flow_id=05fe72ef-c8ac-4e03-8453-5171e5fc5f8b, 
task_id=b92d4144-a9db-4423-ba6b-35934d4f9200 (api:54)
2020-08-31 18:44:42,008+0200 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call 
StoragePool.connectStorageServer succeeded in 3.11 seconds (__init__:312)
2020-08-31 18:44:42,063+0200 INFO  (jsonrpc/5) [vdsm.api] START 
getStorageDomainsList(spUUID='----', 
domainClass=3, storageType='', remotePath='/dev/mapper/vdo1', options=None) 
from=:::192.168.0.87,40378, flow_id=eff548f9-b663-45dd-b8a5-5854f9b5dde8, 
task_id=98ca66a5-edee-40d4-9253-6a46409241cc (api:48)
2020-08-31 18:44:42,063+0200 INFO  (jsonrpc/5) [storage.StorageDomainCache] 
Refreshing storage domain cache (resize=True) (sdc:80)
2020-08-31 18:44:42,063+0200 INFO  (jsonrpc/5) [storage.ISCSI] Scanning iSCSI 
devices (iscsi:442)
2020-08-31 18:44:42,101+0200 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call 
Host.ping2 succeeded in 0.00 seconds (__init__:312)
2020-08-31 18:44:42,128+0200 INFO  (jsonrpc/5) [storage.ISCSI] Scanning iSCSI 
devices: 0.06 seconds (utils:390)
2020-08-31 18:44:42,129+0200 INFO  (jsonrpc/5) [storage.HBA] Scanning FC 
devices (hba:60)
2020-08-31 18:44:42,182+0200 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call 
Host.ping2 succeeded in 0.00 seconds (__init__:312)
2020-08-31 18:44:42,184+0200 INFO  (jsonrpc/4) [vdsm.api] START 
repoStats(domains=['9992dc21-edf2-4951-9020-7c78f1220e02']) from=::1,44782, 
task_id=7b3f1eb8-f7e3-4467-93a1-27ef9721c90c (api:48)
2020-08-31 18:44:42,184+0200 INFO  (jsonrpc/4) [vdsm.api] FINISH repoStats 
return={'9992dc21-edf2-4951-9020-7c78f1220e02': {'code': 0, 'lastCheck': '3.4', 
'delay': '0.00283651', 'valid': True, 'version': 5, 

[ovirt-users] Re: Error exporting into ova

2020-08-30 Thread thomas
BTW: This is the message I get on the import:
VDSM nucvirt command HSMGetAllTasksStatusesVDS failed: value=low level Image 
copy failed: ("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', 
'-T', 'none', '-f', 'qcow2', 
'/rhev/data-center/mnt/petitcent.mtk.hoberg.net:_flash_export/fe9fb0db-2743-457a-80f0-9a4edc509e9d/images/3be7c1bb-377c-4d5e-b4f6-1a6574b8a52b/845cdd93-def8-4d84-9a08-f8c991f89fe3',
 '-O', 'raw', 
'/rhev/data-center/mnt/glusterSD/nucvirt.mtk.hoberg.net:_vmstore/ba410e27-458d-4b32-969c-ad0c37edaceb/images/3be7c1bb-377c-4d5e-b4f6-1a6574b8a52b/845cdd93-def8-4d84-9a08-f8c991f89fe3']
 failed with rc=1 out=b'' err=bytearray(b'qemu-img: error while writing sector 
9566208: No such file or directory\\n')",) abortedcode=261
8/29/2011:41:21 AM
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QRZ5FQVWLMIJ2LBQC7ZLZ4XGWKRMLBQY/


[ovirt-users] Re: Error exporting into ova

2020-08-30 Thread thomas
> On Fri, Aug 28, 2020 at 2:31 AM  
> You should really try the attach/detach storage domain, this is the
> recommended way to move
> vms from one ovirt system to another.
> 
> You could detach the entire domain with all vms from the old system,
> and connect it to the new
> system, without copying even one bit.
> 
> I guess you cannot do this because you don't use shared storage?
> 
These are all HCI setups with GlusterFS, so storage is shared in a way...

I am also experimenting with a backup (not export) domain on NFS and/or 
removable media (just temp local storage, exported via NFS), but the handling 
is very odd, to say the least (see my other post for the full story).
Basically the documentation says you move all VM disks to the backup domain 
after cloning the VM. And then it says nothing more... (how does the VM 
definition get carried over? Can I then destroy the remaing clone VM? Do I need 
to re-create a similar VM at the target? etc.)

The in-place upgrade producedure in the docs for the HCI case has far too many 
tersly described steps that can go wrong with someone like me doing it: I even 
manage to fail a green-field setup many times somehow

And even if I were to do the upgrade as described, I do need to know that the 
export/clean-migration/import is still a viable option, should something go 
wrong.
> ...
> 
> Using ovirt 4.3 when 4.4 was released is going to be painful, don't do this.
>
That's why I am migrating, but for that I need to prove a working plan B
> ...
> 
> You are hitting https://bugzilla.redhat.com/1854888
> 
Unfortunately the description doesn't tell if the failure of the silent 
qemu-img was on the export side, resulting a corrupted image: I am assuming 
that qemu-img is used in both export and import.

The failure on the import is not silent, just doesn't seem to make a lot of 
sense, because qemu-img is reporting a write error at the local single node HCI 
gluster target, which has plenty of space and is essentially a loopback in 
1nHCI.
> ...
> 
> No, export domain is using qemu-img, which is the best tool for copying 
> images.
> This is how all disks are copied in oVirt in all flows. There are no
> issues like ignored
> errors or silent failures in storage code.
My problem is getting errors and now understanding what's causing them.

Qemu-img on the target HCI single node Gluster is reporting a write error at 
varying block numbers, often after dozens of gigabytes have already been 
transferred. There is plenty of space on the Gluster, an SSD with VDO 
underneath so the higher risk is actually the source, which is the NFS mount 
from the export domain.

I've tried uploading the image using imageio and your Python sample from the 
SDK, but just as I had done that (with 50MB/s at 1/6 of the performance of the 
qemu-img transfer), I managed to kill the 4.4 cluster by downgrading the 
machine type of the hosted-engine, when I was really trying to make a 
successfully restored VM work with renamed Ethernet devices...

The upload via imageio completed fully, I hadn't tested the disk image with a 
machine yet to see if it would boot.
> 
> ...
> 
> There are no timeouts in storage code, e.g. attach/detach domain, or
> export to export
> domain.
> 
> Nir
Well, that was almost my last hope, because I don't know what could make the 
qemu-img import transfer fail on a write, when the very same image works with 
imageio... Actually, the big difference there is that the resulting disk, which 
is logically configured at 500GB is actually logically consuming 500GB in the 
domain, while sparse images that make it successfully through qemu-img, retain 
their much smaller actual size. VDO is still underneath so it may not matter, 
and I didn't have a chance to try sparsify before I killed the target cluster.

I have also prepared a USB3 disk to act as export domain, which I'll physically 
move, just to ensure the NFS pipe in the qemu-img job isn't the real culprit.

And I guess I'll try the export again, to see if I overlooked some error there.
> 
> 
> Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VDKIOFLJSEGSXV64FJ6BHZVGKS5LYHWH/


[ovirt-users] Re: How to Backup a VM

2020-08-30 Thread thomas
Struggling with bugs and issues on OVA export/import (my clear favorite 
otherwise, especially when moving VMs between different types of hypervisors), 
I've tried pretty much everything else, too.

Export domains are deprecated and require quite a bit of manual handling. 
Unfortunately the buttons for the various operations are all over the place 
e.g. the activation and maintenance toggles are in different pages.

In the end the mechanisms underneath (qemu-img) seem very much the same and 
suffer from the same issues (I have larger VMs that keep failing on imports).

So far the only fool-proof method has been to use the imageio daemon to upload 
and download disk images, either via the Python API or the Web-GUI. Transfer 
times are terrible though, 50MB/s is quite low when the network below is 
2.5-10Gbit and SSDs all around.

Obviously with Python as everybody's favorite GUI these days, you can also copy 
and transfer the VMs complete definition, but I am one of those old guys, who 
might even prefer a real GUI to mouse clicks on a browser.

The documentation on backup domains is terrible. What's missing behind the 404 
link in oVirt becomes a very terse section in the RHV manuals, where you're 
basically just told that after cloning the VM, you should then move its disks 
to the backup domain...

What you are then supposed to do with the cloned VM, if it's ok to simplay 
throw it away, because the definition is silently copied to the OVF_STORE on 
the backup... none of that is explained or mentioned.

There is also no procedure for restoring a machine from a backup domain, when 
really a cloning process that allows a target domain would be pretty much what 
I'd vote for.

Redhat really wants you to buy the professional product there, or use the 
Python GUI.

I've sadly found the OVA files generated by oVirt (QEMU, really) to be 
incompatible with both VMware Workstation 15.5 and VirtualBox 6.12. No idea 
who's fault this is, but both sides are obviously not doing plug-fests every 
other week and I'm pretty sure this could be fixed manually when needed.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IHTKTQJ6CWD27NOPUQTU3NOEMPPHDPNI/


[ovirt-users] Re: How to Backup a VM

2020-08-30 Thread thomas
I found OVA export/import to be rather tricky.
On the final 4.3 release there still remains a bug, which can lead to the OVA 
files containing nothing but zeros for the disks, a single line fix that only 
made it into 4.4
https://bugzilla.redhat.com/show_bug.cgi?id=1813028 

Once I fixed that on 4.3 I had the issue that imports from OVAs, but also from 
an export domain were failing with the qemu-img doing the transfer failing with 
a write error to the local gluster storage, often after a considerable part of 
the work had already been done (the particular image was 115GB in allocated 
size on a 500GB thin disk).

Since the gluster is all fine, has space etc. I was thinking that perhaps a 
timeout might be the real cause (I am not using server class hardware in the 
home-lab, so built-in 'patience' may be too short).

This gives me a hint where to make ansible wait longer, just in case it's a 
timing issue.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AU2PEE2SYZGQN7U5AOTR2ERTL5NRTXU2/


[ovirt-users] Re: Hosted Engine stuck in Firmware

2020-08-30 Thread thomas
Thanks for diving into that mess first, because it allowed me to understand 
what I had done as well...

In my case the issue was a VM moved from 4.3 to 4.4 seemed to be silently 
upgraded from "default" (whatever was default on 4.3) to "Q35", which seems to 
be the new default of 4.4.

But that had it lose the network, because udev was now renaming the NIC in yet 
another manner, when few VMs ever need anything beyond eth0 anyway.

So I went ahead and changed the cluster default to those of the 4.3 cluster 
(including Nehalem CPUs, because I also use J5005 Atom systems). BTW, that was 
initially impossible as the edit-button for the cluster ways always greyed out. 
But on a browser refresh, it suddenly was enabled...
What I don't remember is if the cluster had a BIOS default (it doesn't on 4.3), 
or if I changed that in the default template, which is mentioned somewhere here 
as being rather distructive.

I was about to re-import the machine from an export domain, when I did a 
scheduled reboot of the single node HCI cluster after OS updates.

Those HCI reboots always require a bit ot twiddling on 4.3 and 4.4 for the 
hosted-engine to start, evidently because of some race conditions (requiring 
restarts of glusterd/ovirt-ha-broker/ovirt-ha-agent/vdsmd to fix), but this 
time the SHE simply didn't want to start at all, complaining about missing PCI 
devices at boot after some digging through log files.

With my 4.4. instance currently dead I don't remember if the BIOS or PCI vs 
PCIe machine type is a cluster attribute or part of the template but I do seem 
to remember that the hosted-engine is a bit special here, especially when it 
comes to picking up the base CPU type.

What is a bit astonishing is the fall-through processing that seems to go on 
here, when an existing VM should have its hardware nailed down when it was shut 
down.

It then realized that I might have killed the hosted-engine right there.

And no, /var/run/ovirt...vm.cfg is long gone and I guess it's time for a 
re-install.

For me one issue remains unclear: How identical do machines remain as they are 
moved from a 4.3 host to a 4.4 host?

In my view a hypervisor's most basic social contract is to turn a machine into 
a file and the file back into the very same machine, hopefully even for 
decades. Upgrade of the virtual hardware should be possible, but under controll 
of the user/orchestrator.

I am afraid that oVirt's dynamic reconstruction of the machine from database 
data doesn't always respect that social contract and that needs at least 
documentation, if not fixing.

The 4.3 to 4.4 migration is far from seamless already, this does not help.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ANWFNF6WW53FADMBW5WZR4C3QCV5L765/


[ovirt-users] Re: deprecating export domain?

2020-08-30 Thread thomas
While I am trying to prepare a migration from 4.3 to 4.4 with the base OS 
switch, I am exploring all variants of moving VMs.

OVA export/import and export domains have issues and failures so now I am 
trying backup domains and fail to understand how they are to be used and the 
sparse documentation does not help.

It says:
1. create a clone VM
2. move the cloned VM's disks to the backup domain
and then moves on to the next topic

What I find missing:
1. How do I move the cloned VM (basically whatever configuration data is not 
inside the disk) away from the active domain?
2. Can I just delete it, because 'magically' the OVF store on the backup domain 
will contain all VM data for any VM that has disk images there?
3. A matching reovery/restore procdure: Is it really just as simple as moving 
the disk back (with the OVF/XML definition silently following)?

In terms of usability I'd really love a clone where I can specify the storage 
domain target...
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XOXKJAONF2EF5GCFHC7FX2IDHTPGX35Q/


[ovirt-users] Re: Error exporting into ova

2020-08-27 Thread thomas
I am testing the migration from CentOS7/oVirt 4.3 to CentOS8/oVirt 4.4.

Exporting all VMs to OVAs, and re-importing them on a new cluster built from 
scratch seems the safest and best method, because in the step-by-step 
migration, there is simply far too many things that can go wrong and no easy 
way to fail-back after each step.

But of course it requires that one of the most essential operations in a 
hypervisor actualy works.

For me a hypervisor turns a machine into a file and a file into a machine: 
That's the most appreciated and fundamental quality of it.

oVirt fails right there, repeatedly, and for different reasons and without even 
reporting an error.

So I have manually put the single-line fix in, which settles udev to ensure 
that disks are not exported as zeros. That's the bug which renders the final 
release oVirt 4.3 forever unfit, 4 years before the end of maintenance of 
CentOS7, because it won't be fixed there.

Exporting VMs and re-importing them on another farm generally seemed to work 
after that.

But just as I was exporting not one of the trivial machines, that I have been 
using for testing, but one of the bigger ones, that actually contain a 
significant amout of data, I find myself hitting this timeout bug.

The disks for both the trival and less-trivial are defined at 500GB, thinly 
allocated. The trivial is the naked OS at something like 7GB actually 
allocated, the 'real' has 113GB allocated. In both cases the OVA export file to 
a local SSD xfs partition is 500GB, with lots of zeros and sparse allocation in 
the case of the first one.

The second came to 72GB of 500GB actually allocated, which didn't seem like a 
good sign already, but perhaps there was some compression involved?

Still the export finished without error or incident and the import on the other 
side went just as well. The machine even boots and runs, it was only once I 
started using it, that I suddenly had all types of file system errors... it 
turns out 113-73GB were actually really cut off and missing from the OVA 
export, and there is nobody and nothing checking for that.

I now know that qemu-img is used in the process, which actually runs in a pipe. 
There is no checksumming or any other logic involved to ensure that the format 
conversion of the disk image has retained the integrity of the image. There is 
no obvious simple solution that I can think of, but the combination of 
processing the image through a pipe and an impatient ansible timeout results in 
a hypervisor which fails on the most important elementary task: Turn a machine 
into a file and back into a machine.

IMHO it makes oVirt a toy, not a tool. And worst of all, I am pretty sure that 
RHV has the same quality, even if the sticker price is probably quite different.

I have the export domain backup running right now, but I'm not sure it's not 
using the same mechanism under the cover with potentially similar results.

Yes, I know there is a Python script that will do the backup, and probably with 
full fidelity. And perhaps with this new infrastructure as code approach, that 
is how it should be done anyway.

But if you have a GUI, that should just work: No excuses.

P.S. The allocation size of the big VM in the export domain is again 72GB, with 
the file size at 500GB. I'll run the import on the other end, but by now I am 
pretty sure, the result will be no different.

Unless you resort to Python or some external tool, there is no reliable way to 
back up and restore a VM of less than 80 seconds worth of data transfer and no 
warning, when corruption occurs.

I am not sure you can compete with Nutanix and VMware at this level of 
reliability.

P.P.S. So just where (and on which machine) do I need to change the timeout?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/H6E2LNT76IPMDKBG6UHJQHVU5X3PUTPJ/


[ovirt-users] Re: TroubleshootingoVirt Node deploy FQDN not reachable

2020-08-27 Thread thomas
Just fresh from a single node HCI setup (CentOS 8 + oVirt 4.4.1) I did 
yesterday on an Intel i7-8559U NUC with a single 1TB NVMe storage stick and 
64GB of RAM:

The HCI wizard pretty much assumes a "second disk", but a partition will 
actually do. And it doesn't even have to be in /dev/mapper, even if the wizard 
seems to have some logic requiring that.

But it requires a partition to be present, which you'll need to create (empty) 
by hand and which in my case was something like /dev/nvme0n1p10 where the 
wizard puts /dev/sdb. That's because the wizard will then try to put VDO on 
that storage, which requires a block layer underneath, and then puts thin 
allocation LVs on to of that for the ultimate flexibility and the ability to 
worry much more about performance than detail planning of storage allocations.

And it needs to be really empty, if there is a file system or left-overs of 
one, the Wizard will fail with a not very helpful "device xx has been 
blacklisted" or similar.

Of course, without the HCI (first choice you make in the Wizard), you can set 
up your storage, even with Gluster, in many other ways, but I've found it 
easier to stick with the HCI route, at least given the equipment I had (no SAN, 
no NFS filer).
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5XV5PYBJE2P6SGO3WJN765V2JE5ZQG2I/


[ovirt-users] Re: Is the Hosted Engine setup finished + Can't connect vdsm storage

2020-08-27 Thread thomas
It would be interesting to know, how the previous team got to six nodes: I 
don't remember seeing any documentation how to do that easily...

However, this state of affairs also seems to be quite normal, whenever I reboot 
a single node HCI setup: I've seen that with two systems now, one running 
4.3.11 on CentOS 7.8, the other 4.4.1 on CentOS 8.2.

What seems to happen in my case is some sort of a race condition or time-out, 
ovirt-ha-broker, ovirt-ha-agent and vdsmd all seem to fail in various ways, 
because glusterd isn't showing perfect connectivity between all storage nodes 
(actually in this case, it still fails to be perfect, even if there is only one 
node...)

I tend to restart glusterd carefully on any node that is seen as disconnected 
or not up (gluster volume status all), and once that is perfect and any gluster 
heals are through, I restart ovirt-ha-broker, ovirt-ha-agent and vdsmd nice and 
slow and not really in any particilar order, I just have a look to see if they 
stop complaining or stopping via systemctl status .

In the mean-time I check with hosted-engine --vm-status on all nodes to see if 
this "is the hosted engine setup finished" message disappears and with a bit of 
patience, it tends to come back. You might also went to make sure, that none of 
the nodes are on local maintenance or the whole data center is on global 
maintenance.

Let me tell you that I have pulled a lot of hair when I started with oVirt, 
because I tend to expect immediate reactions to any command I give. But here 
there is such a lot of automation going on in the background, that commands are 
really more like a bit of grease on the cogs of a giant gearbox and most of the 
time it just works automagically.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/X5MDHFRIQDN4KMYUWS7J56VQTGY2Z2H7/


[ovirt-users] Re: Gluster Volume Type Distributed

2020-08-27 Thread thomas
Replicated is pretty much hard coded into all the Ansible scripts for the HCI 
setup. So you can't do anything but replicated and only choose between arbiter 
or full replica there. Distributed give you anything with three nodes, but with 
five, seven, nine or really high numbers, it becomes quite attractive.

Thing is, Gluster and oVirt are basically originally unrelated products and HCI 
is just a shuffle some people at Redhat thought cool (and it is). Not every 
permutation of aggregates between the two is supported, through: This is no 
Nutanix!

But of course, you have the source code...

Since I managed to scavange another 5 storage heavy nodes after I had started 
out with 3 nodes in a HCI cluster, I tried to see what I could do in terms of 
integration. Doing a full rebuild was out of the question, but adding another 
distributed volume (4+1) and a set of compute hosts seemed attractive, right?

I managed to get it done, twiddling ansible scripts and taking advantage of the 
fact that ansible won't do things it already considers done. So where it 
expected a replicated volume, I simply gave it a distributed volume, that 
looked every way on the top as the replicated would have looked as well.

I immediately forgot how I got it done, and of course I didn't take notes, but 
it works so far.
The hosted-engine is pretty generous in accepting volumes and nodes that look 
just like they had been done by the HCI setup; I guess it's because it really 
been designed to use this auto-discovery like setup, to avoid having to insert 
some key configuration data somehow.

That's why it shouldn't be too difficult to switch from a 3node HCI to a 5, 6 
or 9 node HCI, if you're a Gluster virtuozo and manage to restore/copy the data.

What I found most disappointing is the fact that Gluster doesn't seem to 
support converting replicated volumes into distributed ones, once you have 
enough bricks to make that attractive.

There again with oVirt you should be able to migrate disks and VMs by attaching 
volumes and copying images. The only thing a bit tricky is copying the hosted 
engine, but compared to how they inject that thing doing a HCI installation, 
all that would be easy

If you are deep enough into oVirt to understand what they are doing there.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WWMJQKOP42WQ7O6GCXILUVUESWYG4XGH/


[ovirt-users] ovirt 4.4 self-hosted deployment questions

2020-08-27 Thread Michael Thomas
I have not been able to find answers to a couple of questions in the 
self-hosted engine documentation[1].


* When installing a new Enterprise Linux host for ovirt, what are the 
network requirements?  Specifically, am I supposed to set up the 
ovirtmgmt bridge myself on new hosts, or am I supposed to let that be 
handled by the engine when I add the new host to the engine?


* In the 'New Host' dialog on the engine management page, does the 
Hostname/IP that I enter have to be the host's name on the ovirtmgmt 
LAN?  If so, then it seems to me that I need to configure the ovirtmgmt 
bridge myself on new hosts.


* Does the engine need to be able to route outside of the cluster (eg to 
the WAN), or is it allowed to restrict the engine's routing to the local 
cluster?


* In the 'New Host' dialog on the engine management page, what is the 
meaning of 'Choose hosted engine deployment action'?  From the way it is 
phrased, it sounds like this will create a second engine in my cluster, 
which doesn't make sense.  Or does this mean that the new host will be 
able to run the Engine VM in a HA manner?



In my current test deployment I have 3 subnets in my cluster.  Network 
WAN is the WAN.  Network CLUSTER is for communication between cluster 
compute nodes, storage servers, and management servers.  Network OVIRT 
is for ovirt management and VM migration between hosts.


My first self-hosted engine host is connected to networks CLUSTER and 
OVIRT.  The engine VM is only connected to network OVIRT through a 
bridge on the host, but has a gateway that lets it route traffic to 
network CLUSTER (but not network WAN).


Is this an appropriate network setup for ovirt, or should there be no 
distinction between the CLUSTER and OVIRT networks?


--Mike


[1]https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine_using_the_command_line/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6ZQUYBF2IL7XN3ORTBZRRBQANVAJ6427/


[ovirt-users] oVirt nested on oVirt works, when you enable MAC spoofing on host VMs

2020-08-13 Thread thomas
Howto: Create a new network profile that doesn't have the MAC spoofing filter 
included. In my case I used one without any filters, you may want to be more 
careful.

Background:
I had tried the nested approach with the default settings and found that the 
hosted-engine setup worked just find to the point where it was running as a 
nested VM on the install node, but that it failed to tie in the other gluster 
nodes or to communicate to the outside: There was something wrong with the 
network, but no obivous failures.

I tried using other, simpler "client hypervisors" (it's all KVM anyway) 
VirtualBox and 'native' KVM and the symptoms were the same: The VMs ran just 
fine, but the network didn't connect.

And then I remembered that obviously a Hypervisor need to override the MAC for 
each client VM if you want full network access and I had come across 
"no-mac-spoofing" often enough it somehow stayed on my mind... and finally it 
clicked.

Anyhow, validing full functionality now and the main interest in this is the 
ability to do a full simulation/testing of oVirt 4.3->4.4 upgrades on 3-node 
HCI setups, which I'd rather not do on the live physical system.

Of course, it's also quite cool!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TFTY52MYAFY6GDCZ4HVJJHI6Q7BIVTUK/


[ovirt-users] Re: Request for feedback about "Overview of Networking in oVirt"

2020-08-12 Thread thomas
One of the many cases where nested virtualization would be quite handy for 
testing...

If only it worked with oVirt on top of oVirt...
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DWNBTCB7KF2MYQUYCQCYSUVV74LVB4ZG/


[ovirt-users] Re: Support for Shared SAS storage

2020-08-12 Thread thomas

> 
> Since 4.4 is the last feature release, this will never change even if it is 
> not
> documented.
> 
> Nir
Hi Nir,
could you please clarfiy: What does "last feature release" mean here: Is oVirt 
being feature frozen in preparation to something more drastic?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FKGE2VELIU3XBTWDC65CRETNCOTGEDWJ/


[ovirt-users] Re: Is the udev settling issue more wide spread? Getting failed 'qemu-img -convert' also while copying disks between data and vmstore domains

2020-08-11 Thread thomas
Hi Strahil, no updates, especially since I am stuck on 4.3.11 and they tell us 
it's final.
Glusterfs is 6.9-1.el7.

Apart from those three VMs and the inability to copy disks the whole farm runs 
fine so far.

Where would I configure the verbosity of logging? Can't find an obvious config 
option.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FSUDKB6LESGMG7SONNDL45V5KJE4LOSW/


[ovirt-users] Re: Request for feedback about "Overview of Networking in oVirt"

2020-08-11 Thread thomas
Thanks for putting in the effort!

I learned a lot of new things.
I also learned that I need to learn a few more now.
The table could use some alternating background or a grid: Too easy to get lost.

Environments change over time, e.g. you find you really should have split the 
management, storage, migration and north-south networks, but any documentation 
I find says "do it right from the start".

So I wonder if there is any easy enough way to split out the storage network to 
a newly added second set of NICs in a HCI environment, or if a re-install is 
really the only reasonable thing to do.

They have created such a nice looking GUI around the whole network 
configuration stuff, but from what I have experienced, just hitting buttons is 
very dangerous there, while many don't have a description of help item.

Since you're at it: I have been able to make nested virtualization work to a 
degree.
3-node HCI (physical) is hosting another 3-node HCI (virtual), virtual oVirt 
Cockpit deployed a working gluster, launched and prepared the HostedEngine the 
nested level, managed to move it on the Gluster and if I boot the virtual 
nodes, their VDSM will launch the nested hosted engine, but that machine can't 
see the network any more. I can connect to it via hosted-engine --console, it 
has an Ethernet interface, but no traffic gets through either way: Ideas? What 
do I look for?

I am wildy guessing that ovn has nesting issues, so would going the Linux 
bridge approach help there? How do I chose between the two? Must I leave my 
beloved Cockpit wizard and use the script installer?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IWR5B5EUYF5JBNWSHATJQ3RH7SIXWHBH/


[ovirt-users] Re: Hosted Engine Deployment stuck at 3. Prepare VM

2020-08-11 Thread thomas
I came across this in an environment where the proxy was very slow. The 
downloaded image for the appliance gets deleted every so often (and perhaps 
also via the cleanup script), so when it needs to be reloaded from the 
internet, it can take its time, because its quite large (>1GB).

Another source of trouble can be that PackageKit or simply another instance of 
yum/dnf is already running and that is blocking the download. Check with htop 
or similar.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HDY6W2N2VJURBPP35YWDDRPCL6F2HUY4/


[ovirt-users] Is the udev settling issue more wide spread? Getting failed 'qemu-img -convert' also while copying disks between data and vmstore domains

2020-08-11 Thread thomas
While trying to diagnose an issue with a set of VMs that get stopped for I/O 
problems at startup, I try to deal with the fact that their boot disks cause 
this issue, no matter where I connect them. They might have been the first 
disks I ever tried to sparsify and I was afraid that might have messed them up. 
The images are for a nested oVirt deployment and they worked just fine, before 
I shut down those VMs... 

So I first tried to hook them as secondary disks to another VM to have a look, 
but that just cause the other VM to stop at boot.

Also tried downloading, exporting, and plain copying the disks to no avail, OVA 
exports on the entire VM fail again (fix is in!).

So to make sure copying disks between volumes *generally* work, I tried copying 
a disk from a working (but stopped) VM from 'vmstore' to 'data' on my 3nHCI 
farm, but that failed, too!

Plenty of space all around, but all disks are using thin/sparse/VDO on SSD 
underneath.

Before I open a bug, I'd like to have some feedback if this is a standard QA 
test, this is happening to you etc.

Still on oVirt 4.3.11 with pack_ova.py patched to wait for the udev settle, 

This is from the engine.log on the hosted-engine:

2020-08-12 00:04:15,870+02 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] EVENT_ID: 
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM gem2 command HSMGetAllTasksStatusesVDS 
failed: low level Image copy failed: ("Command ['/usr/bin/qemu-img', 'convert', 
'-p', '-t', 'none', '-T', 'none', '-f', 'raw', 
u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_vmstore/9d1b8774-c5dc-46a8-bfa2-6a6db5851195/images/aca27b96-7215-476f-b793-fb0396543a2e/311f853c-e9cc-4b9e-8a00-5885ec7adf14',
 '-O', 'raw', 
u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_data/32129b5f-d47c-495b-a282-7eae1079257e/images/f6a08d2a-4ddb-42da-88e6-4f92a38b9c95/e0d00d46-61a1-4d8c-8cb4-2e5f1683d7f5']
 failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading sector 
131072: Transport endpoint is not connected\\nqemu-img: error while reading 
sector 135168: Transport endpoint is not connected\\nqemu-img: error while 
reading sector 139264: Transport
  endpoint is not connected\\nqemu-img: error while reading sector 143360: 
Transport endpoint is not connected\\nqemu-img: error while reading sector 
147456: Transport endpoint is not connected\\nqemu-img: error while reading 
sector 151552: Transport endpoint is not connected\\n')",)

and this is from the vdsm.log on the gem2 node:
Error: Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 
'none', '-f', 'raw', 
u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_vmstore/9d1b8774-c5dc-46a8-bfa2-6a6db5851195/images/aca27b96-7215-476f-b793-fb0396543a2e/311f853c-e9cc-4b9e-8a00-5885ec7adf14',
 '-O', 'raw', 
u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_data/32129b5f-d47c-495b-a282-7eae1079257e/images/f6a08d2a-4ddb-42da-88e6-4f92a38b9c95/e0d00d46-61a1-4d8c-8cb4-2e5f1683d7f5']
 failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading sector 
131072: Transport endpoint is not connected\nqemu-img: error while reading 
sector 135168: Transport endpoint is not connected\nqemu-img: error while 
reading sector 139264: Transport endpoint is not connected\nqemu-img: error 
while reading sector 143360: Transport endpoint is not connected\nqemu-img: 
error while reading sector 147456: Transport endpoint is not 
connected\nqemu-img: error while reading sector 151552: Transport endpoint is 
not connected\n')
2020-08-12 00:03:15,428+0200 ERROR (tasks/7) [storage.Image] Unexpected error 
(image:849)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/image.py", line 837, in 
copyCollapsed
raise se.CopyImageError(str(e))
CopyImageError: low level Image copy failed: ("Command ['/usr/bin/qemu-img', 
'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', 
u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_vmstore/9d1b8774-c5dc-46a8-bfa2-6a6db5851195/images/aca27b96-7215-476f-b793-fb0396543a2e/311f853c-e9cc-4b9e-8a00-5885ec7adf14',
 '-O', 'raw', 
u'/rhev/data-center/mnt/glusterSD/192.168.0.91:_data/32129b5f-d47c-495b-a282-7eae1079257e/images/f6a08d2a-4ddb-42da-88e6-4f92a38b9c95/e0d00d46-61a1-4d8c-8cb4-2e5f1683d7f5']
 failed with rc=1 out='' err=bytearray(b'qemu-img: error while reading sector 
131072: Transport endpoint is not connected\\nqemu-img: error while reading 
sector 135168: Transport endpoint is not connected\\nqemu-img: error while 
reading sector 139264: Transport endpoint is not connected\\nqemu-img: error 
while reading sector 143360: Transport endpoint is not connected\\nqemu-img: 
error while reading sector 147456: Transport endpoint is not 
connected\\nqemu-img: error while reading sector 151552: T
 ransport endpoint is not connected\\n')",)
2020-08-12 00:03:15,429+0200 ERROR (tasks/7) [storage.TaskManager.Task] 
(Task='6399d533-e96a-412d-b0c3-0548e24d658d') Unexpected 

[ovirt-users] Re: hosted-engine upgrade from 4.3 to 4.4 fails with "Cannot edit VM."

2020-08-10 Thread thomas
If the defalt blank template from a fresh install had enabled 
high-availability, that would be a bug.
But if someone had set this on their blank template, I can see how that would 
cause such an issue and I can just imagine the drama behind finding it... It 
seems common enough to code for it!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IS7FMYBTC4KTIECXEI3C7EOGXLCTQULM/


[ovirt-users] Re: OVA export creates empty and unusable images

2020-08-10 Thread thomas
t; from oVirt can work without any change? This will make it easy to move
> from oVit to desktop
> hypervisor and back with relatively little effort.
>
You know, when somebody gives you a button, you tend to assume it's even easier 
than moving disks :-) I wouldn't actually assume that OVA export does much more 
than export disk images and add a few tags for the common subset of settings 
between hypervisors. And I was likewise quite surprised at the amount of work 
done by OVA import, when VirtualBox/VMware images were recognized. It's nice to 
see when it works, but I wasn't really expecting that. Linux is so hypervisor 
enabled and comes with built-in guest additions, that I haven't seen the need 
when moving VMs between VB/VMware/Hyper-V. And with Windows VMs, starting with 
W7/W2012 it's become almost a no-op, just pop it in and it will adjust with 
little more than a reboot, unless you want those virtio drivers for speed. 
> 
> Thanks Thomas, this is very useful feedback!

I'll be happy to write up a long review of my experience if you would like 
that. I can type pretty fast, but few people like reading any more..
> 
> Nir
Thank you for your time and patience!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RV2UWK7KAOOPTT6JRR4X7IWUNZHZGIDF/


[ovirt-users] Re: Why OVA imports failed (the other reason...)

2020-08-07 Thread thomas
Here is the explanation, I think:
root 12319 12313 15 16:59 pts/000:00:56 qemu-img convert -O qcow2 
/dev/loop0 
/rhev/data-center/mnt/glusterSD/192.168.0.91:_vmstore/9d1b8774-c5dc-46a8-bfa2-6a6db5851195/images/3be7c1bb-377c-4d5e-b4f6-1a6574b8a52b/845cdd93-def8-4d84-9a08-f8c991f89fe3

This is where the image is entering from the OVA source and gets written on the 
Gluster.

I consistently chose one of the computer cluster nodes, because they have the 
bigger CPUs and it also happened to have the OVA file locally, so the network 
wouldn't have to carry source and sink traffic...

But unless I use one of the nodes that actually have bricks in the Gluster, I 
get this strange silent failure.

First thing I did notice is that during the import dialog, more details about 
the engines are actually visible (disk and network details), before I actually 
launch the import, although Cockpit doesn't seem to care.

I then theorized, that the compute nodes won't actually mount the Gluster file 
system on /rhev, so the target would be missing... but they do in fact...

I'll not go into further details but take away this lesson:

"If you want to import an OVA files to a Gluster farm, you must use a node 
which is part of the gluster to do the import on."

Ah, yes, the import succeeded and oVirt immediately chose the nicely bigger 
Xeon-D node to run the VM...

Now I just need to find out how to twiddle OVA export files from oVirt to make 
them digestable for VMware and VirtualBox...
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QVN6BMIYATKZF6K3EO7HU5FRJD4LBIA2/


[ovirt-users] Why OVA imports failed (the other reason...)

2020-08-07 Thread thomas
Empty disks from exports that went wrong didn't help. But that's fixed now, 
even if I can't fully validate the OVA exports on VMware and VirtualBox.

The export/import target for the *.ova files is an SSD hosted xfs file system 
on a pure compute Xeon D oVirt node, exported and automounted to the 3nHCI 
cluster, also all SSD but J5005 Atoms.

As I import the OVA file, I chose the Xeon-D as the import node and the *local 
path* on that host for the import. Cockpit checks the OVA file, detects the 
machine inside, lets me select and chose it for import, potentially overriding 
some parameters, lets me choose the target storage volume, sets up the job... 
and then fails, rather siliently and with very little in terms of error 
reporting ("connection closed") is the best I got.

Now that same process worked just fine on a single node HCI cluster (also J5005 
Atom), which had me a bit stunned at first, but gave a hint as to the cause: 
Parts of the input job, most likely an qemu-img job, isn't run via the machine 
you selected in the first step and unless the path is global (e.g. external 
NFS), it fails.

If someone from the oVirt team could check and validate or disprove this 
theory, that could be documented and/or added as a check to avoid people 
falling into the same trap.

While I was testing this using a global automount path, my cluster failed me 
(creating and deleting VMs a bit too quickly?) and I had to struggle for a 
while to have it recover.

While those transient ailures are truly frightening, oVirt's ability to recover 
from these scenarios is quite simply awsome.
I guess it's really mostly miscommunication and not real failures and oVirt has 
lots of logic to rectify that.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4QJ4KEOMBBF5KV3LMPEXTGHHV67ZL2LG/


[ovirt-users] Re: Has anyone ever had success importing an OVA VM exported from oVirt to VirtualBox or VMware? On Windows?

2020-08-07 Thread thomas
Not yet, because I was fighting a nasty outage, evidently created from 
importing, starting and deleting images too quickly for my Atoms + Xeon-D based 
test farm...

But it's on the list :-)

Ciao, Thomas
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RFQAKKYS6LP3TYEEVOGY7BWIZ5URKHCZ/


[ovirt-users] Has anyone ever had success importing an OVA VM exported from oVirt to VirtualBox or VMware? On Windows?

2020-08-06 Thread thomas
After applying the OVA export patch, that ensured disk content was actually 
written ino the OVA image, I have been able to transfer *.ova VMs between two 
oVirt clusters. There are still problems that I will report once I have fully 
tested what's going on, but in the mean-time for all those, who want to be able 
to move VMs not only in the direction of oVirt (seems well tested), but also 
the other way (e.g. local desktop use, or even a vSphere farm) an OVA export 
could be a life saver--if it worked.

I have tried importing oVirt exported VMs (which correctly imported and ran on 
a second oVirt host) both on VMware workstation 15.5.6 and VirtualBox 6.1.12 
running on a Windows 2019 host without success.

I've also tried untaring the *.ova into the component *.ovf + image file, but 
the issue seems to be with the XML description which has both imports fail 
pretty much instantly, before they even read the disk image file.

From what I can tell, the XML created by qemu-img looks just fine, the error 
messages emitted by both products are very sparse and seem misleading.

E.g. VirtualBox complains "Empty element ResourceType under Item element, line 
1.", while I see all  well constructed with a ResourceType in every one 
of them.

VMware Workstation is completely useless in terms of log-file-diagnostics: It 
reports an invalid value for the "populatedSize" for the "Disk", but doesn't 
log that to the ovftool.log. I am pretty sure the error is in the inconsistent 
interpretation of this section, where the AllocationUnits, capacity and 
populatedSize leave some room for misinterpretation:

-

List of Virtual Disks

http://www.vmware.com/specifications/vmdk.html#sparse; 
ovf:fileRef="974ad04f-d9bb-4881-aad2-c1d5404200ef" ovf:parentRef="" 
ovf:populatedSize="536953094144" ovf:capacityAllocationUnits="byte * 2^30" 
ovf:capacity="500" ovf:diskId="02b8d976-7e42-44e7-8e24-06f974f1c3ea"/>



I am trying to import the *.ova files on a Windows host and the sparsity get 
typically lost on the scp file transfer, too.

Yes, this most likely isn't an oVirt bug, qemu-img is maintained elsewhere and 
the problem to me looks more on the receiver side.

But have you tried it? Can you imagine it being important, too?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YQI3ZUSZHKPDCHG6ECNBCPQ472HLQTAZ/


[ovirt-users] Re: oVirt Hyperconverged question

2020-08-06 Thread thomas
I have done that, even added five nodes that contribute a separate Gluster file 
system using dispersed (erasure codes, more efficient) mode.

But in another cluster with such a 3-node-HCI base, I had a lot (3 or 4) of 
compute nodes, that were actually dual-boot or just shut off when not used: 
Even used the GUI to do that properly.

This caused strange issues as I shut down all three compute-only nodes: Gluster 
reported loss of quorum, and essentially the entire HCI lost storage, even if 
these compute nodes didn't add bricks to the Gluster at all. In fact the 
compute nodes probably shouldn't have even participated in the Gluster, since 
they were only clients, but the Cockpit wizard added them anyway.

I believe this is because HCI is designed to support adding extra nodes in sets 
of three e.g. for a 9-node setup, which should be really nice with 7+2 disperse 
encoding.

I didn't dare reproduce the situation intentionally, but if you should come 
across this, perhaps you can document and report it. If the (most of) extra 
nodes are permanently running, you don't need to worry.

In terms of regaining control, you mostly have to make sure you turn the 
missing nodes back on, oVirt can be astonishingly resilient. If you then remove 
the nodes prior to shutdown, the quorum issue goes away.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/A4EDM3RYVIYXZ5QAJO4VOYKQUDWYDA4P/


[ovirt-users] Re: Thin Provisioned to Preallocated

2020-08-06 Thread thomas
If OVA export and import work for you, you get to chose between the two at 
import.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/A66DP3GH3C32MFPDUHAZR3GZLXOL7UXG/


[ovirt-users] Do distinct export domains share a name space? Can't export a VM, because it already exists in an unattached export domain...

2020-08-05 Thread thomas
After OVA export/import was a) recommended against b) not working with the 
current 4.3 on CentOS 7, I am trying to make sure I keep working copies of 
critical VMs before I test if the OVA export now works properly, with the 
Redhat fix from 4.4. applied to 4.3.

Long story short, I have an export domain "export", primarily attached to a 3 
node HCI gluster-cluster and another domain "exportMono", primarily attached to 
a single node HCI gluster-cluster.

Yes I use an export domain for backup, because, ...well there is no easy and 
working alternative out of the box, or did I overlook something?
But of course, I also use an export domain for shipping between farms, so 
evidently I swap export domains like good old PDP-11 disk cartridges or at 
least I'd like to.


I started by exporting VM "tdc" from the 1nHCI to exportMono, reattached that 
to 3nHCI for am import. Import worked everything fine, transfer succeed. So I 
detach exportMono, which belongs to the 1nHCI cluster.

Next I do the OVA export on 1nHCI, but I need to get the working and 
reconfigured VM "tdc" out of the way on 3nHCI, so I dump it into the "export" 
export domain belonging to 3nHCI, because I understand I can't run two copies 
of the same VM on a single cluster.

Turns out I can' export it, because even if the export domain is now a 
different one and definitely doesn't contain "tdc" at all, oVirt complains that 
the volume ID that belongs to "tdc" already exists in the export domain

So what's the theory here behind export domains? And what's the state of their 
support in oVirt 4.4?

I understand that distinct farms can't share an export domain, because they 
have no way of coordinating properly. Of course I tried to use one single NFS 
mount for both farms but the second farm properly detected the presense of 
another and required a distinct path.

But from the evidence before me, oVirt doesn't support or like the existance of 
more than one export domain, either: Something that deserves a note or 
explanation. 

I understand they are deprecated in 4.3 already, but since they are also the 
only way to manage valuable VM images moving around, that currently works with 
the GUI on oVirt 4.3 it would be nice to have these questions answered.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6H4NAMPIC2M72JO7KLC2WCBFLYT2PPZR/


[ovirt-users] Re: OVA export creates empty and unusable images

2020-08-04 Thread thomas
HI Gianluca,

I have had a look at the change and it's a single line of code added on the 
hosted-engine. I'll verify that it's working 4.3 and will make a note of 
checking it with engine upgrade, which for now seems the less troublesome 
approach.

Hopefully it will get integrated/backported also with 4.3 or I'll have a chance 
to validate 4.4.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DKXQ7GT3SMUXJ2BG77PRA2H2FLN234HA/


[ovirt-users] Re: OVA export creates empty and unusable images

2020-08-04 Thread thomas
Dear Nir, 

I am sorry if that sounded too harsh. I do appreciate what you are doing!

The thing is, that my only chances at getting my employer to go with the 
commercial variant depend a lot on the upstream project showing already 
demonstratable usability in the research lab.

Just try to imagine how it feels when you do exports for bullet proof external 
snapshots, rebuild a farm and find the exports to be duds on import, when 
everything seemd just fine and files the proper size.

And then imagine planning a migration to a new oVirt release, where going 
backwards or having mixed nodes isn't really an option (and export domains are 
already deprecated in the old release), while nested oVirt, the perfect 
environment to test that migration... turns out to not work at all.

I hope venting some frustration as nicely as I can will help you argue 
internally and help us all create and use a better product.

Thanks for your detailed replies!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7URIALYXZ3XTCFYYK4JUDMA65IVAIOJK/


[ovirt-users] Re: OVA export creates empty and unusable images

2020-08-04 Thread thomas
Nir, first of all, thanks a lot for the detailed description and the quick fix 
in 4.4!

I guess I'll be able to paste that single line fix into the 4.3. variant 
myself, but I'd rather see that included in the next 4.3 release, too: How 
would that work?

---
"OVA is not a backup solution":

From time to time, try to put youself into a user's shoes.

The fist thing you read about Export Domains, is that they are deprecated: That 
doesn't give you the warm fuzzy feeling that you are learning something useful 
when you start using them, especially in the context of a migration to the next 
release.

OVA on the other hand, stands for a maximum of interoperability and when given 
a choice between something proprietary and deprecated and a file format that 
will port pretty much everywhere, any normal user (who doesn't have the code 
behind the scene in his mind), will jump for OVA export/import.

Also it's just two buttons, no hassle, while it took me a while to get an 
Export domain defined, filled, detached, re-attached, and tested.

Again from a user's perspective: HCI gluster storage in oVirt is black magic, 
especially since disk images are all chunked up. For a user it will probably 
take many years of continous oVirt operation until he's confident that he'l 
recover VM storage in the case of a serious hickup and that whatever might have 
gone wrong or bitrot might have occured, won't carry over to an export domain. 
OVA files seem like a nice bet to recover your VM on whatever platform you can 
get back running in a minor disaster.

In many cases, it doesn't even matter you have to shut down the machine to do 
the export, because the machines are application level redundant or simply it's 
ok to have them down for a couple of minutes, if you know you can get them back 
up no matter what in a comparable time frame, oVirt farm dead or alive, e.g. on 
a bare metal machine.

And then my case, many of the images are just meant to move between an oVirt 
farm and a desktop hypervisor.

tl;dr

(working) OVA export and import IMHO are elemental and crucial functionality, 
without which oVirt can't be labelled a product.

I completely appreciate the new backup API, especially with the consistency 
enabled for running VMs; perhaps a little less that I'd have to purchase an 
extra product to do a fundamental operation with a similar ease as the OVA 
export/import buttons, but at least it's there.

That doesn't mean OVA in/ex isn't important or that in fact a shared 
import/export domain would be nice, too.

Thanks for your time!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5E7THHYOGS3YDMG63WHB2JQPSST7PKXA/


[ovirt-users] Re: OVA export creates empty and unusable images

2020-08-01 Thread thomas
Unfortunately that means OVA export/import doesn't seem to be a QA test case 
for oVirt releases... which would make me want to drop it like a hot potato, if 
there was any alternative at this point.

I guess my biggest mistake was to assume that oVirt was like CentOS.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ODJEYA6RW7NHFQ2YO3GW2KPL2SUMIHQS/


[ovirt-users] Re: OVA export creates empty and unusable images

2020-08-01 Thread thomas
I confirm, I get "qemu-img: /dev/loop0: error while converting qcow2: Could not 
open device: Permission denied", too.

You've nailed it and Nir has pasted your log into the bug report 1862115.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KEMZAXVODHW342X75PJLEZPF5CZSM576/


[ovirt-users] Re: OVA export creates empty and unusable images

2020-07-31 Thread thomas
Wow, it seems a good find for a potential reason!

I'd totally subscribe to the cause, if it only affected my Atom HCI clusters, 
who have had timing issues so bad, even the initial 3 node HCI setup had to be 
run on a 5 GHz Kaby Lake to succeed, because my J5005 Atoms evidently were too 
slow.

But I see the very same effect also on another farm, not using quite the newest 
generation of hardware, but even a 28 core Broadwell Xeon, so if timing is the 
issue, somebody from QA needs a serious performance review.

Did you get a chance to test the OVA export/import cycle?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YG7226PSQYDKSCVHPON5IAC3PT4ZNVJX/


[ovirt-users] Re: testing ovirt 4.4 as single node VM inside 4.3

2020-07-31 Thread thomas
Hi Nir,

performance is not really an issue here, we're most interested in functional 
testing and migration support.

That's where nesting 4.4 on top of 4.3 would be a crucial migration enabler, 
especially since you don't support a 7/8 or 4.3/4.4 mix that much elsewhere.

Currently my tests reveal, that ovirt/ovirt nesting does not work at all, oVirt 
on KVM is the only case useds/tested by Redhat internally.

The other issue is that OVA exports on oVirt fail silently, which is about the 
worst to happen. I get full sized OVA files full of zeros that obviously won't 
work imported (without errors, btw).

And for lack of 4.4. nested on 4.3 I couldn't even tell if the export domains 
already deprecated on 4.3 would work at all on 4.4, which would be the only 
other alternative to migrate VMs.

Please, please, please, if you can make OVA export/import, export domains and 
even nested oVirt working, I'd kiss your feet (wearing my Corona mask, of 
course ;-)
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3BODKC2D7DGNMUQ5UXC3S6QYP6LD56U5/


[ovirt-users] Re: oVirt 4.4.1 HCI single server deployment failed nested-kvm

2020-07-31 Thread thomas
Just looking at the release notes, using anything 'older' in the case of 4.4 
seems very rough going... 

Testing 4.4 was actually how I got into the nesting domain this time around, 
because I didn't have physical hardware left for another HCI cluster.

Testing of oVirt 4.4 on top of oVirt 4.3 including migration scenarios is 
probably the most sensible use of nesting overall

Alas I had similarly bad luck during my first evaluation of oVirt about a year 
(or two?) ago, when I tried nesting oVirt on top of a big windows workstation 
with VMware Workstation: Everything seemed to work just fine, but nested VMs 
just stopped right at boot, without any error, but also no progress.

I had a vSphere cluster run no problem on that setup...
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XRIK4SK7TFQS7JASUPCVFGWCJI6YS6SM/


[ovirt-users] Re: oVirt 4.4.1 HCI single server deployment failed nested-kvm

2020-07-31 Thread thomas
That's exactly the point: Running oVirt on top of a vanilla KVM seems to be the 
only case supported, because the Redhat engineers use that themselves for 
development, testing and demoing.

Running virtual machines nested inside an oVirt VM via KVM also works per se, 
as HostedEngineLocal during the setup is working, can be talked to etc. from 
the host that is running the setup, because it uses a local bridge. You can 
even reach everyone from within that nested VM on the outside, but the VM 
itsself is only accessible from the host at the point (normal in all cases).

But in the next stage of the installation that locally prepared VM is modified 
to run on the cluster storage, gluster in the HCI case and use the overlay 
network and that's where it loses network connectivity, I guess because overlay 
networks lack nesting (I guess there is still a non-overlay way to run oVirt 
and then perhaps it even works...)

And while I admire that network overlay stuff, it's also black magic to me, and 
evidently not trivial if not downright impossible to resolve, which is most 
likey why the Redhat engineers don't seem to pursue that path. At least that's 
what I read from this comment, which really should be somewhere right next to 
where  "nested virtualization" is ever mentioned, to ensure expectations are 
properly managed. 
https://lists.ovirt.org/pipermail/users/2017-March/080219.html 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/G5HWL5XN3EEI3UHQACSUNTXVNFZ2V56X/


[ovirt-users] Re: It is possible to export a vm bigger as 5 TB?

2020-07-30 Thread thomas
Thanks to your log post, I was able to identify the proper place to find 
information about the export process.

OVA exports don't fail for me, but result in empty disks, which is perhaps even 
worse.

Two things I can suggest: Try with a new and smaller VM to see if it's actually 
a size related issue or due to the fact that e.g. the number of disks in an OVA 
export are limited.

The other is to make do with an export domain, which seems to work fine and 
might have less issues with OVA file format constraints: Don't know if you just 
need the backup or want to migrate...
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5ODWFJPBRG5YVNK7DZYVZNYXIY36BRRW/


[ovirt-users] Re: OVA export creates empty and unusable images

2020-07-30 Thread thomas
Export/Import/Transfer via export domain worked just fine. Just moved my last 
surviving Univention domain controller from the single node HCI 
disaster-recovery farm to the three node HCI primary, where I can now try to 
make it primary and start building new backups...

Backups that do not work are food for nightmares!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AGK7KEY3RRKOVUBDGYPZO2JCQHIEA2BR/


[ovirt-users] Re: OVA export creates empty and unusable images

2020-07-30 Thread thomas
Here is the ticket: https://bugzilla.redhat.com/show_bug.cgi?id=1862115 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W4BMX4FPWL3LVR26K637UJ43YJ7E2KUT/


[ovirt-users] Re: OVA export creates empty and unusable images

2020-07-30 Thread thomas
Bongiorno Gianluca,

I am entering the data in bugzilla right now...

Logs: There is a problem, I have not found any log evidence of this export and 
import, I don't even know which component is actually doing this.

For 'foreign' imports like VMware and other *.vmdk producing hypervisors, VDSM 
seems to be doing the job and leaves logs in /var/log/vdsm/import, describing 
the format conversion using virt-v2v.

For these 'domestic' imports, there is no such log while the imports seem to 
complete, but with little or no disk data.

I am just validating that at least the transfer between farms or the 
export/import via the deprecated export domain still works.

It's quite messy and unintuitive as buttons to detach an export domain don't 
enable, until you have found a way to put them into maintenance, which for some 
reason has to be done from the datacenter etc.

It's still importing on the target as I write this...

Yes, I guess some validation by others would be lovely, just make sure you 
don't delete any VM you might want back later ;-)
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CPZOYGX6YKHZXADYOEZURHDFGGSVQ7YX/


[ovirt-users] OVA export creates empty and unusable images

2020-07-30 Thread thomas
I've tested this now in three distinct farms with CentOS7.8 and the latest 
oVirt 4.3 release: OVA export files only contain an XML header and then lots of 
zeros where the disk images should be.

Where an 'ls -l .ova' shows a file about the size of the disk, 'du -h 
.ova' shows mere kilobytes, 'strings .ova' dumps the XML and 
then nothing but repeating zeros until the end of the file.

Exporting existing VMs from a CentOS/RHEL 7 farm and importing them after a 
rebuild on CentOS/RHEL 8 would seem like a safe migration strategy, except when 
OVA export isn't working.

Please treat with priority!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UY6WYFZKDUA7IVAJH7UKZVTMFA5CX34X/


[ovirt-users] Re: oVirt thrashes Docker network during installation

2020-07-29 Thread thomas
Hi Jp, while the project looks intriging, it also looks still very whet behind 
the ears and nothing I could do on a side job time budget.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TTJL3A737AY6L35S5Q4PJ4H2U7X2SIV4/


[ovirt-users] Re: Non storage nodes erronously included in quota calculations for HCI?

2020-07-29 Thread thomas
Sorry Strahil, that I didn't get back to you on this...

I dare not repeat the exercise, because I have no idea if I'll get out of such 
a complete break-down again cleanly.

Since I don't have duplicate physical infrastructure to just test this 
behavior, I was going to use a big machine to run a test farm nested.

Spent about a week in trying to get nesting work, but it ultimately failed to 
run the overlay network of the hosted engine properly (separate post somewhere 
here).

And then I read somewhere in a response to a post way back here, that oVirt 
nested on oVirt isn't only "not supported" but known (although not documented 
or advertised) not to work at all.

So there went the chance to reproduce the issue...

What I find striking is that the 'original' oVirt or RHV from pre-gluster HCI 
days, seems to support the notion of shutting down compute nodes when there 
isn't enough workload to fill them in order to save energy. In a HCI 
environment that obviously doesn't play well with the gluster storage nodes, 
but pure compute nodes should still support cold standby.

Can't find any documentation on this, though.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JX3A7DJBQVM4UQ5RQ5QA6GFAWPVTO42V/


[ovirt-users] Re: Hosted Engine Deployement

2020-07-29 Thread thomas
I might have run into similar issues at one point...

And it might be where the network configuration on the installation host has 
already been partially changed to include the bridge that's used to communicate 
with HostedEngineLocal.

The installation is not able to pick up cleanly from every possible 
intermediate point, so sometimes you have to restart from a relatively clean 
slate by runnung 'ovirt-hosted-engine-cleanup' first.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/E2BZHZFAZ66M4XAUJYNT6HMNM6FVGJZT/


[ovirt-users] Re: OVA import does not upload disks

2020-07-29 Thread thomas
I have the same issue, it seems, but AFAIK the problem is on the export side:

While the OVA logical file size matches the size of the VM's disk, the actual 
storage usage on the VDO compressed and deduplicated NFS I use as export target 
per 'du -h *.ova' is a mere 28KB, while 'strings .ova' reports an XML 
header and then nothing but zeros.

I sure hope it's easy to fix and not an "we don't support importing oVirt OVA 
exports..."
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DP77GIX7DI2OM67H5TVJIZGYPTO6YBT4/


[ovirt-users] Re: Problem with paused VMs in ovirt 4.3.10.

2020-07-29 Thread thomas
Hi Damien, I'm afraid nobody will be able to help you here. What made the CPU 
jump into a sea of "FF FF FF FF..." can't be diagnosed remotely, but it's KVM 
that stopped, because that's not legal x86-64 code, nothing oVirt can do about 
it.

Salut Damien, j'ai bien peur que personne ne puisse t'aider ici. Ce qui a fait 
sauter le CPU dans une mer de "FF FF FF FF FF..." ne peut pas être diagnostiqué 
à distance, mais c'est la KVM qui s'est arrêtée, parce que ce n'est pas un code 
x86-64 légal, rien de oVirt ne peut y faire.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YEWK3EIS6AMKHZFY7TDIUVRJVXDAWB6Z/


[ovirt-users] Re: Ovirt hypervisor as an DRBD node

2020-07-29 Thread thomas
I believe the oVirt team thought it easier to implement on GlusterFS, which 
AFAIK does pretty much what you want to do, but comes supported as part of the 
product as an (almost-)ready-to-use HCI.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EQDGBBLMDKMECZXMS5QCMK5Y2VGEE4SO/


[ovirt-users] Re: Management Engine IP change

2020-07-29 Thread thomas
I honestly don't know, because I have not tried myself, so you might just want 
to stop reading right here.

But from what I understand about the design philosophy of oVirt, it should be 
ok to change it, while nobody probably ever tested it and everybody would tell 
you it's a bad idea to do so, unless you're willing to rebuild from scratch.

The reason it *should* be ok is that the nodes don't care about the management 
engine.
They care only about the information on the shared cluster storage.
How that gets there, who changes it: They couldn't care less.

Yet, I believe I have seen events being reported back to the management engine 
via REST API calls, referring to the management engine via URIs that should 
have been using FQDN only. So there is some biliateral communication going on, 
mostly asynchronous as far as I can see and not using IPs.

What I can tell you, is that /etc/sysconfig/network-scripts/ifcfg-eth0 has 
NM_CONTROLLED=no inside, so nmtui won't go near it. You can change that, delete 
the line, edit the config... and hopefully it will live.

And in case things go seriously wrong, you should be able to fix it with 
hosted-engine --console

But, again, I never tried it myself.

But I use DHCP on two out of four farms...
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JGIDCFQGTEO4NTS45OLXWSL6POP4ZD5T/


[ovirt-users] Re: where is error log for OVA import

2020-07-29 Thread thomas
From what I have observed, OVA import seems to have two modes:

If the OVA is a 'foreign' format, the ova file needs to be converted and that 
conversion effort is then logged into /var/log/vdsm/import on the oVirt node 
where in import runs. Import failures are then mostly an issue with the 
inability to either open the OVA file itself for writing or perhaps another 
temorary file just besides. Under the assumption that the conversion was 
running under the UID/GID of vdsm, I ensured that write permissions on file and 
directory were given to vdsm:kvm and at that point the conversion ran fine and 
spewed plenty of logging output to a file on that path.

Now when the OVA came from an oVirt export, that log file doesn't seem to get 
created, at least I never saw something appear there, even after the import had 
successfully finished, as per GUI.

But those OVA re-imports where completely buggy, mostly because the export 
files seem to be defect, an error I just reported in another post.

For you with a VMware export, things might look more bright, once you sort out 
access rights and potentially fs capacity, as the conversion might require 
enough temporary space for two copies of the VM, since it seems to only be put 
into oVirt storage, after the conversion.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GUGYOHFLP7XCRRBOG3UWTZOYMR7LCPWS/


[ovirt-users] Re: oVirt 4.4.1 HCI single server deployment failed nested-kvm

2020-07-29 Thread thomas
I tried using nested virtualization, too, a couple of weeks ago.

I was using a 3 node HCI CentOS 7.8 cluster and I got pretty far. Configuring 
KVM to work with nested page tables isn't all that well documented but I got 
there, I even installed some host extensions, that seem requried.

Even the actual nesting, that is a VM run inside a VM did work, the setup came 
to the point where it ran the hosted engine on temporary local storage, before 
it's picked up, fixed up to run on the Gluster storage and restarted there. But 
that process failed eventually, evidently because the overlay network doesn't 
support nesting. Where the initial hosted engine is using a local bridge with 
the (in this case virtual) host--and that works--afterwards it's using the 
overlay network and that evidently doesn't.

It was only then when I ran across a very obscure message somewhere in this 
mailing list, that oVirt on top of oVirt in fact does not work at all! Up to 
that point it just wasn't "supported".

When a hypervisor producer speaks of nested virtualization support, I would 
understand it to mean that you can run their product under their product, 
ideally also somebody else's product. I've run ESX on VMware workstation and 
that was pretty cool.

In the case of oVirt from what I have gathered (and I'd love to be wrong), it 
is supposed to only mean that you can run oVirt on top of KVM.

Not the other way around, nor in any other way most likely.

To me that looks much more like an internal Redhat development facility, than a 
product feature.

Of course I mostly wish they'd find a way to make it work like it does on 
VMware.
But next on my wishlist would be an explicit description of what does and what 
doesn't work.

This way it was almost a week of a me against the computer adventure where I 
lost.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7WHRDMXMAWW3DPS24X23X3GCY5S2MCZF/


[ovirt-users] OVA exported from oVirt cannot be imported in oVirt

2020-07-29 Thread thomas
I have successfully imported OVA files which came from VirtualBox before, 
something I'd consider the slightly greater hurdle.

Exporting a VM to an OVA file and then importing it on another platform can 
obviously create compatibility issues, but oVirt <-> oVirt exports and imports 
with the same (or similar) release should obviously just work, right?

Sadly, that doesn't seem to be the case.

I've just exported a Linux VM, a Univention domain controller using Debian 9 to 
an OVA and tried importing it on another farm.
The process finished without an error, but the machine wouldn't boot.

Closer inspection reveals that the exported OVA file is about 100GB in size, 
which corresponds to the size of the thinly allocated primary disk, but 
actually contains only 28kb of data (the XML header), while the disk is all 
zeros ('strings .ova')

Another VM I had exported a week ago contains about 2.3GB of data, but that 
machine also doesn't boot. When I attach its disk to another VM as a secondary, 
the file system seems to contain bogus data, most directories are unreadable 
and an fsck goes on for minutes.

Export domains are deprecated but when I export the original and runnable VM 
there, I get 23GB, which corresponds to what the VM is actually using. 
Infortunately that doesn't give me the mobility for the VM that I desire, 
especially since I cannot have a shared export/import domain between farms. And 
then I really might want to use a different hypervisor for a VM that was 
developed on oVirt, e.g. to put on a laptop.

I've been trying to find clues as to what's going on, but generally the OVA 
exports don't seem to be logged in any obvious place and even the imports on 
seem to get logged, when the OVAs need to be converted from a foreign format 
such as VirtualBox where the entire, seemingly rather complex process, is 
logged in /var/log/vdsm/import/

Am I the only one trying to use OVA export/import or is that part of standard 
Q testing?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MLFPYBMPWFPJS6LYOUKTZHTZHPD52JYO/


[ovirt-users] Re: VM locked and script unlock_entity.sh don't show any problem

2020-07-29 Thread Thomas BURGUIERE
We've managed to restore a normal state for the locked VMs with 2 simple steps :
- moving the SPM role to an another node.
- restarting the ovirt-engine on the ovirt engine HOST
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7HDZWR2YG3YKUWCEAH6X4TFLE5UMO35B/


[ovirt-users] VM locked and script unlock_entity.sh don't show any problem

2020-07-28 Thread Thomas BURGUIERE
The WEB interface show that the VM is locked, but when we run unlock_entity.sh 
there is no problem reported.

How we think the VM were locked :
There were filesystem issues (shortage disk space on the ovirt engine HOST) 
during a backup sequence, those consequences were discovered a few days later.
The VMs are backed-up with this way:
https://github.com/wefixit-AT/oVirtBackup

The ovirt-engine logs tell us that Postgres DB has encountered problem like 
this one during the backup process:
Message: PL/pgSQL function updatevdsdynamic(integer,integer,character 
varying,numeric,character 
varying,boolean,integer,integer,integer,uuid,integer,integer,integer,integer,integer,integer,integer,integer,character
 varying,character varying,character varying,character 
varying,integer,character varying,integer,integer,integer,boolean,character 
varying,character varying,character varying,character varying,character 
varying,character varying,character varying,character varying,character 
varying,character varying,character varying,integer,text,integer,character 
varying,character varying,character varying,character varying,character 
varying,character varying,text,text,smallint,integer,smallint,boolean,character 
varying,text,text,boolean,boolean,text,character 
varying,boolean,uuid,boolean,jsonb) line 4 at SQL statement; nested exception 
is org.postgresql.util.PSQLException: ERROR: could not extend file 
"base/16385/190997": No space left on device

Later, we saw that those VMs they got a lock icon in the status column since 
this episode. This did not appear immediatly, but maybe a few hours after.

It happens that those locked VMs are sort-of limited in the way that there are 
numerous things we cannot do with them, such as: 
- edit parameters (i.e. memory).
- start them again after they get halted by unix command (halt -p) 
- copy the related disk.

The version of ovirt used 4.2.1

Is there an other way to unlock the VMs ?
Or to get unlock_entity.sh to work ?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JJJ3VWJTA2DWW6ZICZZJBJQU7PTBYAW3/


[ovirt-users] Re: NVMe-oF

2020-06-27 Thread thomas
Irgendwie habe ich das Gefühl, daß Dir da eine schlüsselfertige Lösung 
vorschwebt, die Du direkt verticken kannst...

From what I have gathered, Redhat prefers to work with layers of abstraction. 
NVMe, whether it's direct attached or over a nice (Mellanox operated dual 
personality?) fabric will have to fit an existing mold, in this case probably 
SAN, even if there is no reason you cannot attach disks via NVMeoF fabric and 
export them perhaps even across the same fabric as Gluster bricks.

If you want something better integrated, you have hooks and you have APIs to 
make oVirt do what you want from a fabric manager you implement yourself.

Aber das ist nur eine ganz persönliche Meinung.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EEQVGLGX265V2UZ75GUC7AXJBGW3MECB/


[ovirt-users] Re: engine failure

2020-06-27 Thread thomas
Not sure I understand your interest for an older repo. I am still regularly 
doing ovirt 4.3 installs just by not chosing the 4.4 repo, which is exclusively 
CentOS 8 (while 4.3 is exclusively CentOS 7: You cannot mix).

Here is my current ovirt-4.3.repo:
[root@ yum.repos.d]# more ovirt-4.3.repo
[ovirt-4.3]
name=Latest oVirt 4.3 Release
#baseurl=https://resources.ovirt.org/pub/ovirt-4.3/rpm/el$releasever/
mirrorlist=https://resources.ovirt.org/pub/yum-repo/mirrorlist-ovirt-4.3-el$releasever
enabled=1
skip_if_unavailable=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-ovirt-4.3

I also don't know what you mean when you say that your "engine server failed" 
or that it is stand-alone:
Is it a non-HCI setup with the engine on an ordinary machine? I can't easily 
imagine how an oVirt update would manage to break that.

Is it a single-node HCI setup with a management engine VM? Those are very easy 
to break indeed, unless you carefully follow the "minor release upgrade guide" 
in the documentation.

If you have a HCI setup where your management VM has failed during an update, I 
had had such a problem myself and managed to fix it. Perhaps this helps:

My engine was very dead, because it was fenced right in the middle of an 
update. The engine would start as a VM (hosted-engine --vm-start), it wasn't 
paused, but it wouldn't react either. No network, no access to the console via 
hosted-engine --console or via virsh console. Seemed an early boot failure, but 
without the HostedEngine to get to the console...

I connected through the virsh backdoor (vdsm@ovirt/shibboleth are always good 
to remember) and managed to get a snapshot from the console, which showed me a 
grub boot error, because the initial ramdisk for the new kernel had not 
finished building yet.

So I needed a way to get to the grub menu of the HostedEngine while it was 
booting...

I managed by starting the HostedEngine in 'paused' mode (now I know why that 
option is there ;-). I then gave the machine a VNC console password (another 
command I never noticed before) and ran a VNC viewer against the URL that 
another virsh command revealed (IP and relative port number of the console). 
With the vnc viewer connected I then unpaused the VM via 'virsh resume 
HostedEngine' and quickly jumped to the VNC viewer, where indeed I was able to 
boot an older kernel, do a re-install of the newer one and recover the 
HostedEngine VM.

Much better than slaying dragons in one of the games my kids play during 
week-ends, and a huge confidence builder.

There isn't really tons of things going on in the HostedEngine VM. It is able 
to survive quite a bit of mishandling and resets most of the time. Accordingly 
there is a good chance there is nothing broken there, that cannot be fixed in 
standard Linux ways.

Good luck!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WXTKK3WLWUMCPKRHQV6ZA2BBYW3HBJVI/


[ovirt-users] Re: Does a slow Web proxy (can't run the update-check) ruin HA scores? (Solved, ...but really?)

2020-06-27 Thread thomas
I'd say you were close!

I tried fiddling with the penalties, but that didn't do anything good.
But once I found that hosted-engine --vm-status displayed the score across the 
hosts, I found them to be very low constantly, the 1600 gateway penalty seems a 
proper match.

I then reinstalled the cluster, bypassing any dependencies on DNS, which may be 
a little slow as it's not under my control. I have fully-fleshed out /etc/hosts 
files to accelerate that, but those seem to be ignored sometimes, or only come 
into play when a DNS lookup has outright failed, not just taken too long.

In the cockpit setup screen you get to chose if you want to use DNS, ping or 
TCP for a liveliness check, I guess for the ovirt-ha-agent or -broker, and I 
also chose 'ping' there, which also has the cockpit screen immediately happy, 
while the 'dns' setting seems to take a long time.

With that, I see scores of 3400 all around so I guess that nailed it. I've 
found the Python code that implements the ovirt-ha monitors, but I can't 
something a broker.conf file or any other entry where the mechanism is actually 
configured, so I can change and test with different settings without a 
re-installation.

I quite like the liberty a proper DNS might give me, in case I need to move 
networks again. Yet after this, I'm very motivated to go back to plain old 
hardwired IPv4.

Pretty confident it wasn't the missing package updates now (sorry guys!), but 
at least it got me looking in the proper direction...
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QMVP4OP3HQO377AHDZQ2VLGO3BGQJVBC/


  1   2   3   >