[ovirt-users] Re: Gluster Install Fail again :(
OK, that's odd . Can you check the following: On all nodes: grep storage[1-3].private /etc/hosts for i in {1..3}; do host storage${i}.private.net; done On the first node: gluster peer probe storage1.private.netgluster peer probe storage2.private.netgluster peer probe storage3.private.net gluster pool list gluster peer status Best Regards,Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RL452O2NMF4ULZI2FIV2Y5LGRL2QYW2L/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/IARG3H7ZIQDSOMJIY7VC3AAFKAOT3MCH/
[ovirt-users] Re: Gluster Install Fail again :(
gluster pool list UUIDHostnameState 1d17652f-f567-4a6d-9953-e0908ef5e361localhost Connected gluster pool list UUIDHostnameState 612be7ce-6673-433e-ac86-bcca93636d64localhost Connected gluster pool list UUIDHostnameState 772faa4f-44d4-45a7-8524-a7963798757blocalhost Connected gluster peer status Number of Peers: 0 cat cmd_history.log [2021-10-29 17:33:22.934750] : peer probe storage1.private.net : SUCCESS : Probe on localhost not needed [2021-10-29 17:33:23.162993] : peer probe storage2.private.net : SUCCESS [2021-10-29 17:33:23.498094] : peer probe storage3.private.net : SUCCESS [2021-10-29 17:33:24.918421] : volume create engine replica 3 transport tcp storage1.private.net:/gluster_bricks/engine/engine storage2.private.net:/gluster_bricks/engine/engine storage3.private.net:/gluster_bricks/engine/engine force : FAILED : Staging failed on storage3.private.net. Error: Host storage1.private.net not connected Staging failed on storage2.private.net. Error: Host storage1.private.net not connected [2021-10-29 17:33:28.226387] : peer probe storage1.private.net : SUCCESS : Probe on localhost not needed [2021-10-29 17:33:30.618435] : volume create data replica 3 transport tcp storage1.private.net:/gluster_bricks/data/data storage2.private.net:/gluster_bricks/data/data storage3.private.net:/gluster_bricks/data/data force : FAILED : Staging failed on storage2.private.net. Error: Host storage1.private.net not connected Staging failed on storage3.private.net. Error: Host storage1.private.net not connected [2021-10-29 17:33:33.923032] : peer probe storage1.private.net : SUCCESS : Probe on localhost not needed [2021-10-29 17:33:38.656356] : volume create vmstore replica 3 transport tcp storage1.private.net:/gluster_bricks/vmstore/vmstore storage2.private.net:/gluster_bricks/vmstore/vmstore storage3.private.net:/gluster_bricks/vmstore/vmstore force : FAILED : Staging failed on storage3.private.net. Error: Host storage1.private.net not connected Staging failed on storage2.private.net. Error: Host storage1.private.net is not in 'Peer in Cluster' state [2021-10-29 17:49:40.696944] : peer detach storage2.private.net : SUCCESS [2021-10-29 17:49:43.787922] : peer detach storage3.private.net : SUCCESS OK this is what I have so far, still looking for the complete ansible log. Brad From: Strahil Nikolov Sent: October 30, 2021 10:27 AM To: ad...@foundryserver.com; users@ovirt.org Subject: Re: [ovirt-users] Gluster Install Fail again :( What is the output of : gluster peer list (from all nodes) Output from the ansible will be useful. Best Regards, Strahil Nikolov I have been working on getting this up and running for about a week now and I am totally frustrated. I am not sure even where to begin. Here is the error I get when it fails, TASK [gluster.features/roles/gluster_hci : Create the GlusterFS volumes] *** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoneType: None failed: [storage1.private.net] (item={'volname': 'engine', 'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": false, "item": {"arbiter": 0, "brick": "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "error running gluster (/usr/sbin/gluster --mode=script volume create engine replica 3 transport tcp storage1.private.net:/gluster_bricks/engine/engine storage2.private.net:/gluster_bricks/engine/engine storage3.private.net:/gluster_bricks/engine/engine force) command (rc=1): volume create: engine: failed: Staging failed on storage3.private.net. Error: Host storage1.private.net not connected\nStaging failed on storage2.private.net. Error: Host storage1.private.net not connected\n"} An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoneType: None failed: [storage1.private.net] (item={'volname': 'data', 'brick': '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": false, "item": {"arbiter": 0, "brick": "/gluster_bricks/data/data", "volname": "data"}, "msg": "error running gluster (/usr/sbin/gluster --mode=script volume create data replica 3 transport tcp storage1.private.net:/gluster_bricks/data/data storage2.private.net:/gluster_bricks/data/data storage3.private.net:/gluster_bricks/data/data force) command (rc=1): volume create: data: failed: Staging failed on storage2.private.net. Error: Host storage1.private.net not connected\nStaging failed on storage3.private.net. Error: Host storage1.private.net not connected\n"} An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoneType: None failed: [storage1.private.net] (item={'volname':
[ovirt-users] Re: question about engine deployment success rate
Hi Strahil, Thanks for your reply! Since we are using 4.3.9 from the installation ISO, issues regarding CentOS 8 rolling releases do not apply here. We will do that upgrade sooner or later, but then i would try to do local yum repository mirrors that are always snapshots. More or less how you suggest… So my success rate of 4 in 10 attempts has nothing to do with changing package or Ovirt versions. It happens when applying exactly the same procedure, scripts, installation media, on the exact same machine. So the input is always the same, but the result is at least 2 different error states in 6 cases, and only 4 times a working system. I would like to understand if that is normal and everybody needs their handful of attempts to deploy the engine until it succeeds or if this is a sign that we still do something wrong that we need to track down and fix. Thanks, Henning On Sat 30. Oct 2021 at 14:31, Strahil Nikolov wrote: > If you want to increase your deployment success, you will need to use > repository management and freeze your OS & oVirt repos to a working level . > For example if you use RHEL 8.4 and current level of oVirt - you will have > dependency issues until RHEL 8.5 is released. > > Once this happens and your build succeeds, you will lock your repos and > deploy from them again.Once you test each 'batch' of repos and confirm they > work for you -> you will create a new set of repos ... > > oVirt is dynamic project with new features constantly comming up (and > sometimes go away). > > Another approach is to use the ovirt Node image which is based on CentOS > Stream and is validated in the dev infrastructure . > > > Best Regards, > Strahil Nikolov > > On Wed, Oct 27, 2021 at 21:39, Henning Sprang > wrote: > > Hello, > > I've just inherited a project where we need to bring a prototype of a > small Ovirt system (single node or 3 node hyperconverged, with > glusterFS on the same machine, a bunch of different VM's ) running in > an industrial machine into serial production. > > This means, we want to build a new 1 or 3 node Ovirt system each day > up until 3 times a day. > > In my tests so far, the failure rate of the Ovirt engine deployment > (via the included scripts as well as the web UI) turns out to be > pretty high - it's between 40-60%, meaning until we have a running > system, we would have to try the installation and/or final engine > deployment about 2-4 times until we are successful. > > So far I could not identify clear error messages that let me tell how > to fix the problem. > > Before going into details of the errors I would like to ask if people > deeper into Ovirt would consider this a somewhat normal success rate, > or if this indicates we are doing something generally wrong and we > should definitely spend a few more hours or maybe days into finding > sources of problems. > > More info about the system and errors > > * OVirt 4.3.9 (because the prototype was made and verified with that > version - would be interesting to know, too, if it's strongly > considered to upgrade for more stable installation/deployment) > * The errors that appear are changing between the deployment process > seeming not to be able to transfer the "LocalHostedEngine" VM to the > glusterFS storage to become a "HostedEngine", and the other seems to > be when the engine is already up and running, but never being really > connected to the Ovirt system, continuously restarting, and also > showing XFS filesystem errors in it's dmesg output. > > Any hints on our chances on getting this solved or requests for more > information about the error are welcome - thanks in advance. > > Henning > > ___ > > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/PAX7UPTXTGISDSFCABRLBHE63Y5GD6RR/ > > -- Henning Sprang http://www.sprang.de ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UVSWYUG4S6Y6SODFBEOBK3VBA5STHSEL/
[ovirt-users] Re: question about engine deployment success rate
Well, it (the install process)requires some polishing. Any reason not to use 4.3.10 ? This is the only supported version for migration to 4.4 Can you share what were the errors ? Best Regards,Strahil Nikolov On Sat, Oct 30, 2021 at 20:25, Henning Sprang wrote: Hi Strahil, Thanks for your reply! Since we are using 4.3.9 from the installation ISO, issues regarding CentOS 8 rolling releases do not apply here.We will do that upgrade sooner or later, but then i would try to do local yum repository mirrors that are always snapshots. More or less how you suggest… So my success rate of 4 in 10 attempts has nothing to do with changing package or Ovirt versions. It happens when applying exactly the same procedure, scripts, installation media, on the exact same machine. So the input is always the same, but the result is at least 2 different error states in 6 cases, and only 4 times a working system. I would like to understand if that is normal and everybody needs their handful of attempts to deploy the engine until it succeeds or if this is a sign that we still do something wrong that we need to track down and fix. Thanks,Henning On Sat 30. Oct 2021 at 14:31, Strahil Nikolov wrote: If you want to increase your deployment success, you will need to use repository management and freeze your OS & oVirt repos to a working level .For example if you use RHEL 8.4 and current level of oVirt - you will have dependency issues until RHEL 8.5 is released. Once this happens and your build succeeds, you will lock your repos and deploy from them again.Once you test each 'batch' of repos and confirm they work for you -> you will create a new set of repos ... oVirt is dynamic project with new features constantly comming up (and sometimes go away). Another approach is to use the ovirt Node image which is based on CentOS Stream and is validated in the dev infrastructure . Best Regards,Strahil Nikolov On Wed, Oct 27, 2021 at 21:39, Henning Sprang wrote: Hello, I've just inherited a project where we need to bring a prototype of a small Ovirt system (single node or 3 node hyperconverged, with glusterFS on the same machine, a bunch of different VM's ) running in an industrial machine into serial production. This means, we want to build a new 1 or 3 node Ovirt system each day up until 3 times a day. In my tests so far, the failure rate of the Ovirt engine deployment (via the included scripts as well as the web UI) turns out to be pretty high - it's between 40-60%, meaning until we have a running system, we would have to try the installation and/or final engine deployment about 2-4 times until we are successful. So far I could not identify clear error messages that let me tell how to fix the problem. Before going into details of the errors I would like to ask if people deeper into Ovirt would consider this a somewhat normal success rate, or if this indicates we are doing something generally wrong and we should definitely spend a few more hours or maybe days into finding sources of problems. More info about the system and errors * OVirt 4.3.9 (because the prototype was made and verified with that version - would be interesting to know, too, if it's strongly considered to upgrade for more stable installation/deployment) * The errors that appear are changing between the deployment process seeming not to be able to transfer the "LocalHostedEngine" VM to the glusterFS storage to become a "HostedEngine", and the other seems to be when the engine is already up and running, but never being really connected to the Ovirt system, continuously restarting, and also showing XFS filesystem errors in it's dmesg output. Any hints on our chances on getting this solved or requests for more information about the error are welcome - thanks in advance. Henning ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/PAX7UPTXTGISDSFCABRLBHE63Y5GD6RR/ -- Henning Sprang http://www.sprang.de ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/7SWGEE5GLGPFJCKTYGOQBQ6GBJDSSGZX/
[ovirt-users] Re: Gluster Install Fail again :(
What is the output of :gluster peer list (from all nodes) Output from the ansible will be useful. Best Regards,Strahil Nikolov I have been working on getting this up and running for about a week now and I am totally frustrated. I am not sure even where to begin. Here is the error I get when it fails, TASK [gluster.features/roles/gluster_hci : Create the GlusterFS volumes] *** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoneType: None failed: [storage1.private.net] (item={'volname': 'engine', 'brick': '/gluster_bricks/engine/engine', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": false, "item": {"arbiter": 0, "brick": "/gluster_bricks/engine/engine", "volname": "engine"}, "msg": "error running gluster (/usr/sbin/gluster --mode=script volume create engine replica 3 transport tcp storage1.private.net:/gluster_bricks/engine/engine storage2.private.net:/gluster_bricks/engine/engine storage3.private.net:/gluster_bricks/engine/engine force) command (rc=1): volume create: engine: failed: Staging failed on storage3.private.net. Error: Host storage1.private.net not connected\nStaging failed on storage2.private.net. Error: Host storage1.private.net not connected\n"} An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoneType: None failed: [storage1.private.net] (item={'volname': 'data', 'brick': '/gluster_bricks/data/data', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": false, "item": {"arbiter": 0, "brick": "/gluster_bricks/data/data", "volname": "data"}, "msg": "error running gluster (/usr/sbin/gluster --mode=script volume create data replica 3 transport tcp storage1.private.net:/gluster_bricks/data/data storage2.private.net:/gluster_bricks/data/data storage3.private.net:/gluster_bricks/data/data force) command (rc=1): volume create: data: failed: Staging failed on storage2.private.net. Error: Host storage1.private.net not connected\nStaging failed on storage3.private.net. Error: Host storage1.private.net not connected\n"} An exception occurred during task execution. To see the full traceback, use -vvv. The error was: NoneType: None failed: [storage1.private.net] (item={'volname': 'vmstore', 'brick': '/gluster_bricks/vmstore/vmstore', 'arbiter': 0}) => {"ansible_loop_var": "item", "changed": false, "item": {"arbiter": 0, "brick": "/gluster_bricks/vmstore/vmstore", "volname": "vmstore"}, "msg": "error running gluster (/usr/sbin/gluster --mode=script volume create vmstore replica 3 transport tcp storage1.private.net:/gluster_bricks/vmstore/vmstore storage2.private.net:/gluster_bricks/vmstore/vmstore storage3.private.net:/gluster_bricks/vmstore/vmstore force) command (rc=1): volume create: vmstore: failed: Staging failed on storage3.private.net. Error: Host storage1.private.net not connected\nStaging failed on storage2.private.net. Error: Host storage1.private.net is not in 'Peer in Cluster' state\n"} Here are the facts. using 4.4.9 of ovirt. using ovirtnode os partion for gluster: /dev/vda4 > 4T in unformatted space. able to ssh into each host on the private.net and known hosts and fqdn passes fine. On the volume page: all default settings. On the bricks page: JBOD / Blacklist true / storage host storage1.private.net / default lvm except the device is /dev/sda4 I really need to get this setup. The first failure was the filter error, so I edited the /etc/lvm/lvm.conf to comment out the filter line. Then without doing a clean up I reran the deployment and got the above error. Thanks in advance Brad ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SAYZ3STV3ILDE42T6JUXLKVHSIX7LRI5/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ONOEUHHYL6YVIAREMQQLDWSUY3PR2RWO/
[ovirt-users] Re: Q: oVirt guest agent + spice-vdagent on Debian 11 Bullseye
You need qemu-guest-agent as ovirt agent is no longer needed, nor available. Best Regards,Strahil Nikolov On Fri, Oct 29, 2021 at 12:33, Andrei Verovski wrote: Hi, Anyone have compiled these deb packages for Debian 11 Bullseye? oVirt guest agent + spice-vdagent Packages from Buster can’t be installed on Bullseye because of broken libnl dependencies. Thanks in advance. Andrei ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FYF7FKKSVZHBO6U3HOBQ7R6AJSATEES3/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SCHHN6ZDXWBQVEZYGBAXTPCUHWHFYHSB/
[ovirt-users] Re: Engine VM FQDN will not validate.
Do you have a PTR record for the engine's IP ? Best Regards,Strahil Nikolov Hello, I am trying to install the Hosted engine using the wizard. I am NOT using the hyper converged. When I add the fqdn of vmengine1.domain.com I get the error. localhost is not a valid address. When I add the host fqdn in the advanced area it validates, if I put the vmengine1.domain.com in the the host section it validates. Facts: - I have the domain in dns, confirmed by dig to resolve to the public IP of the host. - I have the domain in hosts file confirmed by using ping and dig - I have generated a ssh key for the root user on the host and I have added it to the known hosts file. - I have logged into via ssh to all forms of addressing, so the known hosts file has been updated. localhost ecdsa-sha2-nistp256 host1 ecdsa-sha2-nistp256 host1.localhost ecdsa-sha2-nistp256 vmengine.localhost ecdsa-sha2-nistp256 host1.domain.com ecdsa-sha2-nistp256 vmengine1.domain.com ecdsa-sha2-nistp256 The reading suggests that the error doesn't mean what it says, but something about ssh not able to login to the host. I have been able to login to the host with ssh manually from the cli. Any help would be greatly appreciated. thanks Brad ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/N7TRXSW3UZRXCA536WJCIJILWA5TJVBQ/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/7GZNP2SAE5HMTQ67BO5RCBZ2Y5CVH45A/
[ovirt-users] Re: question about engine deployment success rate
If you want to increase your deployment success, you will need to use repository management and freeze your OS & oVirt repos to a working level .For example if you use RHEL 8.4 and current level of oVirt - you will have dependency issues until RHEL 8.5 is released. Once this happens and your build succeeds, you will lock your repos and deploy from them again.Once you test each 'batch' of repos and confirm they work for you -> you will create a new set of repos ... oVirt is dynamic project with new features constantly comming up (and sometimes go away). Another approach is to use the ovirt Node image which is based on CentOS Stream and is validated in the dev infrastructure . Best Regards,Strahil Nikolov On Wed, Oct 27, 2021 at 21:39, Henning Sprang wrote: Hello, I've just inherited a project where we need to bring a prototype of a small Ovirt system (single node or 3 node hyperconverged, with glusterFS on the same machine, a bunch of different VM's ) running in an industrial machine into serial production. This means, we want to build a new 1 or 3 node Ovirt system each day up until 3 times a day. In my tests so far, the failure rate of the Ovirt engine deployment (via the included scripts as well as the web UI) turns out to be pretty high - it's between 40-60%, meaning until we have a running system, we would have to try the installation and/or final engine deployment about 2-4 times until we are successful. So far I could not identify clear error messages that let me tell how to fix the problem. Before going into details of the errors I would like to ask if people deeper into Ovirt would consider this a somewhat normal success rate, or if this indicates we are doing something generally wrong and we should definitely spend a few more hours or maybe days into finding sources of problems. More info about the system and errors * OVirt 4.3.9 (because the prototype was made and verified with that version - would be interesting to know, too, if it's strongly considered to upgrade for more stable installation/deployment) * The errors that appear are changing between the deployment process seeming not to be able to transfer the "LocalHostedEngine" VM to the glusterFS storage to become a "HostedEngine", and the other seems to be when the engine is already up and running, but never being really connected to the Ovirt system, continuously restarting, and also showing XFS filesystem errors in it's dmesg output. Any hints on our chances on getting this solved or requests for more information about the error are welcome - thanks in advance. Henning ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/PAX7UPTXTGISDSFCABRLBHE63Y5GD6RR/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z7SNEP7B5VIMWLYI6USES2U5UHG2Z73S/