[ovirt-users] Re: ovirt-4.3 hyperconverged deployment - no option for "disc count" for JBOD

2019-02-15 Thread feral
Let us know how the rest of the deployment goes. I ended up running into
another dozen or so issues that I managed to work around but the cluster
performance was abysmal and I didn't have time to figure out why.

On Fri, Feb 15, 2019, 9:04 PM Jason Brooks  On Fri, Feb 15, 2019 at 8:41 PM Jason Brooks  wrote:
> >
> > Did a bug ever get opened for this? I just hit it. My ovirt node 4.3 has
> gdeploy-2.0.8-1.el7.noarch.
>
>
> I filed https://bugzilla.redhat.com/show_bug.cgi?id=1677827
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/47UFOBRK7SVJQRBB53QPB4CRZYKIJQU4/


[ovirt-users] Re: ovirt-4.3 hyperconverged deployment - no option for "disc count" for JBOD

2019-02-11 Thread feral
No. I gave up on ovirt 4.2 (node iso and centos) and moved over to the node
4.3 iso, and that's where I found the problem with the JBOD. I then moved
back to centos and installed 4.3 there, which comes with
gdeploy-2.0.8-1.el7.noarch
. I don't know if this version had the issue or not, as I assumed that
given I'm using the same ovirt/gluster repos, it'd probably have had the
same problem.

I've since wiped out the cluster and have been testing a single ovirt-4.3
node (from centos) to try to rule out a few issues. Found that for some
reason, while my bare metal raid array gives me about 1GBps r/w (XFS on top
of LVM), as soon as I mount a brick and try to r/w to it, that performance
drops to between 5 and 55MBps depending on block size. The native XFS and
gluster XFS configs are identical. 1000MBps o the native XFS with 4k
blocks, and 5MBps on gluster with 4k blocks. I can get up to 55MBps on
gluster with 64k blocks. So something else isn't right, but I haven't had
any luck in asking around, as everyone seems to feel this is just an issue
of incrimental performance tweaks. I of course argue that all of the
recommended tweaks only gained 1-2% performance, and that we'd need about
800 of them to get decent performance.

So at this point I'm not really sure what to do next. When I've used
gluster in the past (configured manually), I've never had a problem getting
at least 80-90% native performance. I've also noticed that for some reason,
ovirt runs my guests like complete garbage and I have no idea why. I get
about 1/20th to 1/50th the performance from ovirt, as I get on the same
hardware with boring qemu/kvm.
I'm just bringing up a single host (centos with 4.3) now, and trying local
storage instead of gluster, to see if there's any major change in overall
VM performance. Just uploading an ISO to the local storage though, is
already about 80x faster than uploading it to a gluster volume.

And... somewhat confirmed. Using local storage (just a plain XFS file
system) showed major improvement to guest VM performance overall. The disc
throughput is still very slow (160MBps vs the 1000MBps of the host), but
overall, the entire VM is much more responsive.
Meanwhile, testing a clone of the same VM on my 7 year old laptop, which
pushes around 300MBps, the VM is pushing 280MBps average. Both using XFS.

So why is ovirt's guest disc performance (native and gluster) so poor? Why
is it consistently giving me about 1/10th to 1/80th of the hosts disc
throughput?




On Mon, Feb 11, 2019 at 5:01 AM Sahina Bose  wrote:

>
>
> On Wed, Feb 6, 2019 at 10:45 PM feral  wrote:
>
>> On that note, this was already reported several times a few months back,
>> but apparently was fixed in gdeploy-2.0.2-29.el7rhgs.noarch. I'm
>> guessing ovirt-node-4.3 just hasn't updated to that version yet?
>>
>
> +Niels de Vos  +Sachidananda URS 
> Any update on the gdeploy package update in CentOS ?
>
>
>> On Wed, Feb 6, 2019 at 8:51 AM feral  wrote:
>>
>>> Bugzilla must have lost my account. Requesting password reset on an
>>> account that doesn't exist, results in 500 :p.
>>> Created a new one.
>>>
>>> However, my options for gdeploy version do not match the version 2.0.2
>>> installed. They only list 0.xx.x.
>>>
>>> On Wed, Feb 6, 2019 at 8:44 AM Greg Sheremeta 
>>> wrote:
>>>
>>>>
>>>> On Wed, Feb 6, 2019 at 11:31 AM feral  wrote:
>>>>
>>>>> Sorry, I tried opening a bug, but RH's Bugzilla is having issues and
>>>>> 500's out when I try to request password reset.
>>>>>
>>>>
>>>> Sorry to hear that. I hate to ask, but could you try after clearing
>>>> cookies for *redhat.com, or try in an incognito tab? It's done that to
>>>> me before and clearing cookies worked for me. If it still doesn't work,
>>>> I'll open the bug. Thanks!
>>>>
>>>>
>>>>>
>>>>>
>>>>> ---
>>>>> PLAY [gluster_servers]
>>>>> *
>>>>>
>>>>> TASK [Run a shell script]
>>>>> **
>>>>> changed: [ovirt-431.localdomain] =>
>>>>> (item=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d vdb -h
>>>>> ovirt-431.localdomain, ovirt-432.localdomain, ovirt-433.localdomain)
>>>>>
>>>>> PLAY RECAP
>>>>> *
&

[ovirt-users] Re: ovirt-node-4.3, deployment fails when moving hosted engine vm to gluster storage.

2019-02-07 Thread feral
Update:

Removing all gluster mounts from /etc/fstab, solves the boot problem. I am
then able to manually mount all gluster bricks, bring up glusterd properly.
I'm trying to deploy the hosted VM again now, but I suspect that the
problem there as well, is going to be that it's trying to mount a gluster
brick before bringing up networking.

On Thu, Feb 7, 2019 at 8:37 AM feral  wrote:

> Which logs?
>
> The nodes hang on boot at "Started Flush Journal to Persistent Storage".
> This would be followed by gluster mounts coming up (before networking,
> which still doesn't make sense to me...) but they of course all fail as
> networking is down.
> The gluster logs, post node failure, simply state that all connection
> attempts are failing (because networking is down.
>
> I managed to get networking online manually and push logs OUT (cant' start
> sshd as that causes reboot).
> https://drive.google.com/open?id=1Kdb2pRUC0O-5u3ZkA3KT0qvAQIpv9SZm
>
> It seems to me that some vital systemd service must be failing on the
> nodes (and possibly that's what's happening on the VM as well?
>
> On Thu, Feb 7, 2019 at 8:25 AM Simone Tiraboschi 
> wrote:
>
>>
>>
>> On Thu, Feb 7, 2019 at 5:19 PM feral  wrote:
>>
>>> I've never managed to get a connection to the engine via VNC/Spice
>>> (works fine for my other hypervisors...)
>>>
>>> As I said, the network setup is super simple. All three nodes have 1
>>> interface each (eth0). They are all set with static IP's, with matching
>>> DHCP reservations on the DHCP server, with matching DNS. All nodes have
>>> entries in /etc/hosts on each machine. IP's are 192.168.1.195-7, and the
>>> engine VM gets 192.168.1.198. During the engine deployment, the VM does
>>> come up on 198. I can ping it and ssh into it, but at some point, the
>>> connection drops.
>>> So I'm not relying on DHCP or DNS at all. VM comes up where expected,
>>> for a while, then it goes to reboot to get transferred to the
>>> gluster_engine storage, and that's where it drops offline and never comes
>>> back.
>>>
>>> I did another round of deployment tests last night and discovered that
>>> the nodes all fail to boot immediately after the gluster deployment (not
>>> after VM deployment as I mistakenly stated earlier). So the nodes get in a
>>> bad state during gluster deployment. They stay online just fine and gluster
>>> works perfect, until the node tries to reboot (which it fails to do).
>>>
>>
>> So I suggest to focus on the gluster deployment; can you please share
>> gluster logs?
>>
>>
>>>
>>> Also, the networking I'm using is identical to my ovirt 4.2 setup. I'm
>>> using the same MAC addresses, IP's, and hostnames (4.2 cluster is offline
>>> when I'm trying 4.3). They are identical configurations other than the
>>> version of ovirt-node.
>>>
>>> On Thu, Feb 7, 2019 at 12:15 AM Simone Tiraboschi 
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Feb 6, 2019 at 11:07 PM feral  wrote:
>>>>
>>>>> I have no idea what's wrong at this point. Very vanilla install of 3
>>>>> nodes. Run the Hyperconverged wizard, completes fine. Run the engine
>>>>> deployment, takes hours, eventually fails with :
>>>>>
>>>>> [ INFO ] TASK [oVirt.hosted-engine-setup : Check engine VM health]
>>>>> [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed":
>>>>> true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta":
>>>>> "0:00:00.340985", "end": "2019-02-06 11:44:48.836431", "rc": 0, "start":
>>>>> "2019-02-06 11:44:48.495446", "stderr": "", "stderr_lines": [], "stdout":
>>>>> "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, 
>>>>> \"extra\":
>>>>> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
>>>>> (Wed Feb 6 11:44:44
>>>>> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
>>>>> 11:44:44
>>>>> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
>>>>> \"hostname\": \"ovirt-431.localdomain\", \"host-id\&quo

[ovirt-users] Re: ovirt-node-4.3, deployment fails when moving hosted engine vm to gluster storage.

2019-02-07 Thread feral
Which logs?

The nodes hang on boot at "Started Flush Journal to Persistent Storage".
This would be followed by gluster mounts coming up (before networking,
which still doesn't make sense to me...) but they of course all fail as
networking is down.
The gluster logs, post node failure, simply state that all connection
attempts are failing (because networking is down.

I managed to get networking online manually and push logs OUT (cant' start
sshd as that causes reboot).
https://drive.google.com/open?id=1Kdb2pRUC0O-5u3ZkA3KT0qvAQIpv9SZm

It seems to me that some vital systemd service must be failing on the nodes
(and possibly that's what's happening on the VM as well?

On Thu, Feb 7, 2019 at 8:25 AM Simone Tiraboschi 
wrote:

>
>
> On Thu, Feb 7, 2019 at 5:19 PM feral  wrote:
>
>> I've never managed to get a connection to the engine via VNC/Spice (works
>> fine for my other hypervisors...)
>>
>> As I said, the network setup is super simple. All three nodes have 1
>> interface each (eth0). They are all set with static IP's, with matching
>> DHCP reservations on the DHCP server, with matching DNS. All nodes have
>> entries in /etc/hosts on each machine. IP's are 192.168.1.195-7, and the
>> engine VM gets 192.168.1.198. During the engine deployment, the VM does
>> come up on 198. I can ping it and ssh into it, but at some point, the
>> connection drops.
>> So I'm not relying on DHCP or DNS at all. VM comes up where expected, for
>> a while, then it goes to reboot to get transferred to the gluster_engine
>> storage, and that's where it drops offline and never comes back.
>>
>> I did another round of deployment tests last night and discovered that
>> the nodes all fail to boot immediately after the gluster deployment (not
>> after VM deployment as I mistakenly stated earlier). So the nodes get in a
>> bad state during gluster deployment. They stay online just fine and gluster
>> works perfect, until the node tries to reboot (which it fails to do).
>>
>
> So I suggest to focus on the gluster deployment; can you please share
> gluster logs?
>
>
>>
>> Also, the networking I'm using is identical to my ovirt 4.2 setup. I'm
>> using the same MAC addresses, IP's, and hostnames (4.2 cluster is offline
>> when I'm trying 4.3). They are identical configurations other than the
>> version of ovirt-node.
>>
>> On Thu, Feb 7, 2019 at 12:15 AM Simone Tiraboschi 
>> wrote:
>>
>>>
>>>
>>> On Wed, Feb 6, 2019 at 11:07 PM feral  wrote:
>>>
>>>> I have no idea what's wrong at this point. Very vanilla install of 3
>>>> nodes. Run the Hyperconverged wizard, completes fine. Run the engine
>>>> deployment, takes hours, eventually fails with :
>>>>
>>>> [ INFO ] TASK [oVirt.hosted-engine-setup : Check engine VM health]
>>>> [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed":
>>>> true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta":
>>>> "0:00:00.340985", "end": "2019-02-06 11:44:48.836431", "rc": 0, "start":
>>>> "2019-02-06 11:44:48.495446", "stderr": "", "stderr_lines": [], "stdout":
>>>> "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\":
>>>> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
>>>> (Wed Feb 6 11:44:44
>>>> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
>>>> 11:44:44
>>>> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
>>>> \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
>>>> {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
>>>> \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
>>>> \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
>>>> 12995, \"host-ts\": 12994}, \"global_maintenance\": false}",
>>>> "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\":
>>>> true, \"extra\":
>>>> \"me

[ovirt-users] Re: ovirt-node-4.3, deployment fails when moving hosted engine vm to gluster storage.

2019-02-07 Thread feral
I've never managed to get a connection to the engine via VNC/Spice (works
fine for my other hypervisors...)

As I said, the network setup is super simple. All three nodes have 1
interface each (eth0). They are all set with static IP's, with matching
DHCP reservations on the DHCP server, with matching DNS. All nodes have
entries in /etc/hosts on each machine. IP's are 192.168.1.195-7, and the
engine VM gets 192.168.1.198. During the engine deployment, the VM does
come up on 198. I can ping it and ssh into it, but at some point, the
connection drops.
So I'm not relying on DHCP or DNS at all. VM comes up where expected, for a
while, then it goes to reboot to get transferred to the gluster_engine
storage, and that's where it drops offline and never comes back.

I did another round of deployment tests last night and discovered that the
nodes all fail to boot immediately after the gluster deployment (not after
VM deployment as I mistakenly stated earlier). So the nodes get in a bad
state during gluster deployment. They stay online just fine and gluster
works perfect, until the node tries to reboot (which it fails to do).

Also, the networking I'm using is identical to my ovirt 4.2 setup. I'm
using the same MAC addresses, IP's, and hostnames (4.2 cluster is offline
when I'm trying 4.3). They are identical configurations other than the
version of ovirt-node.

On Thu, Feb 7, 2019 at 12:15 AM Simone Tiraboschi 
wrote:

>
>
> On Wed, Feb 6, 2019 at 11:07 PM feral  wrote:
>
>> I have no idea what's wrong at this point. Very vanilla install of 3
>> nodes. Run the Hyperconverged wizard, completes fine. Run the engine
>> deployment, takes hours, eventually fails with :
>>
>> [ INFO ] TASK [oVirt.hosted-engine-setup : Check engine VM health]
>> [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed":
>> true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta":
>> "0:00:00.340985", "end": "2019-02-06 11:44:48.836431", "rc": 0, "start":
>> "2019-02-06 11:44:48.495446", "stderr": "", "stderr_lines": [], "stdout":
>> "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\":
>> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
>> (Wed Feb 6 11:44:44
>> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
>> 11:44:44
>> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
>> \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
>> {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
>> \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
>> \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
>> 12995, \"host-ts\": 12994}, \"global_maintenance\": false}",
>> "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\":
>> true, \"extra\":
>> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
>> (Wed Feb 6 11:44:44
>> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
>> 11:44:44
>> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
>> \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
>> {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
>> \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
>> \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
>> 12995, \"host-ts\": 12994}, \"global_maintenance\": false}"]}
>> [ INFO ] TASK [oVirt.hosted-engine-setup : Check VM status at virt level]
>> [ INFO ] changed: [localhost]
>> [ INFO ] TASK [oVirt.hosted-engine-setup : debug]
>> [ INFO ] ok: [localhost]
>> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail if engine VM is not
>> running]
>> [ INFO ] skipping: [localhost]
>> [ INFO ] TASK [oVirt.hosted-engine-setup : Get target engine VM IP
>> address]
>> [ INFO ] changed: [localhost]
>> [ INFO ] TASK [oVir

[ovirt-users] ovirt-node 4.3 hyperconverged - fails on VM deployment - all nodes broken

2019-02-06 Thread feral
Boring 3 vanilla node deployment. Default options except for gluster volume
size. All three nodes on static IP on LAN. All three have each other listed
in /etc/hosts as well as dns.
Gluster wizard complete successfully, deploy VM. VM deployment fails after
copying to the gluster storage. At that point, rebooting any node results
in node not coming back to life. No networking, so no other services.
Only error I can get is to check the engine.log, which I cannot do as the
engine fails to come up.

Anyone else seeing this with a vanilla install/deployment?

-- 
_
Fact:
1. Ninjas are mammals.
2. Ninjas fight ALL the time.
3. The purpose of the ninja is to flip out and kill people.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4QDZX5DK4VC6LPGS3SOM2NR6KFW7QUA6/


[ovirt-users] Re: ovirt-node-4.3, deployment fails when moving hosted engine vm to gluster storage.

2019-02-06 Thread feral
Update, when the node is rebooted, it fails with "timed out waiting for
device dev-gluster_vg_vdb-gluster_lv_data.device
The node also has no networking online, which is probably the cause of the
gluster failure.

On Wed, Feb 6, 2019 at 2:04 PM feral  wrote:

> I have no idea what's wrong at this point. Very vanilla install of 3
> nodes. Run the Hyperconverged wizard, completes fine. Run the engine
> deployment, takes hours, eventually fails with :
>
> [ INFO ] TASK [oVirt.hosted-engine-setup : Check engine VM health]
> [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed":
> true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta":
> "0:00:00.340985", "end": "2019-02-06 11:44:48.836431", "rc": 0, "start":
> "2019-02-06 11:44:48.495446", "stderr": "", "stderr_lines": [], "stdout":
> "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\":
> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
> (Wed Feb 6 11:44:44
> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
> 11:44:44
> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
> \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
> {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
> \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
> \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
> 12995, \"host-ts\": 12994}, \"global_maintenance\": false}",
> "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\":
> true, \"extra\":
> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
> (Wed Feb 6 11:44:44
> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
> 11:44:44
> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
> \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
> {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
> \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
> \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
> 12995, \"host-ts\": 12994}, \"global_maintenance\": false}"]}
> [ INFO ] TASK [oVirt.hosted-engine-setup : Check VM status at virt level]
> [ INFO ] changed: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : debug]
> [ INFO ] ok: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail if engine VM is not
> running]
> [ INFO ] skipping: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Get target engine VM IP address]
> [ INFO ] changed: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Get VDSM's target engine VM
> stats]
> [ INFO ] changed: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Convert stats to JSON format]
> [ INFO ] ok: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Get target engine VM IP address
> from VDSM stats]
> [ INFO ] ok: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : debug]
> [ INFO ] ok: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail if Engine IP is different
> from engine's he_fqdn resolved IP]
> [ INFO ] skipping: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail is for any other reason
> the engine didn't started]
> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The
> engine failed to start inside the engine VM; please check engine.log."}
>
> ---
>
> I can't check the engine.log as I can't connect to the VM once this
> failure occurs. I can ssh in prior to the VM being moved to gluster
> storage, but as soon as it starts doing so, the VM never comes back online.
>
>
> --
> _
> Fact:
> 1. Ninjas are mammals.
> 2. Ninjas fight ALL the time.
> 3. The purpose of the ninja is to flip out and kill people.
>


-- 
_
Fact:
1. Ninjas are mammals.
2. Ninjas fight ALL the time.
3. The purpose of the ninja is to flip out and kill people.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NBBBT25VZDO5EXUKFUE2HUALDLVSAROB/


[ovirt-users] ovirt-node-4.3, deployment fails when moving hosted engine vm to gluster storage.

2019-02-06 Thread feral
I have no idea what's wrong at this point. Very vanilla install of 3 nodes.
Run the Hyperconverged wizard, completes fine. Run the engine deployment,
takes hours, eventually fails with :

[ INFO ] TASK [oVirt.hosted-engine-setup : Check engine VM health]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true,
"cmd": ["hosted-engine", "--vm-status", "--json"], "delta":
"0:00:00.340985", "end": "2019-02-06 11:44:48.836431", "rc": 0, "start":
"2019-02-06 11:44:48.495446", "stderr": "", "stderr_lines": [], "stdout":
"{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\":
\"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
(Wed Feb 6 11:44:44
2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
11:44:44
2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
\"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
{\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
\"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
\"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
12995, \"host-ts\": 12994}, \"global_maintenance\": false}",
"stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\":
true, \"extra\":
\"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
(Wed Feb 6 11:44:44
2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
11:44:44
2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
\"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
{\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
\"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
\"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
12995, \"host-ts\": 12994}, \"global_maintenance\": false}"]}
[ INFO ] TASK [oVirt.hosted-engine-setup : Check VM status at virt level]
[ INFO ] changed: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : debug]
[ INFO ] ok: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Fail if engine VM is not running]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Get target engine VM IP address]
[ INFO ] changed: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Get VDSM's target engine VM
stats]
[ INFO ] changed: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Convert stats to JSON format]
[ INFO ] ok: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Get target engine VM IP address
from VDSM stats]
[ INFO ] ok: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : debug]
[ INFO ] ok: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Fail if Engine IP is different
from engine's he_fqdn resolved IP]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [oVirt.hosted-engine-setup : Fail is for any other reason the
engine didn't started]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The
engine failed to start inside the engine VM; please check engine.log."}

---

I can't check the engine.log as I can't connect to the VM once this failure
occurs. I can ssh in prior to the VM being moved to gluster storage, but as
soon as it starts doing so, the VM never comes back online.


-- 
_
Fact:
1. Ninjas are mammals.
2. Ninjas fight ALL the time.
3. The purpose of the ninja is to flip out and kill people.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LAM3TX64UQOOO7A2WBMGZUEF4TVFHXJA/


[ovirt-users] Re: ovirt-4.3 hyperconverged deployment - no option for "disc count" for JBOD

2019-02-06 Thread feral
On that note, this was already reported several times a few months back,
but apparently was fixed in gdeploy-2.0.2-29.el7rhgs.noarch. I'm guessing
ovirt-node-4.3 just hasn't updated to that version yet?

On Wed, Feb 6, 2019 at 8:51 AM feral  wrote:

> Bugzilla must have lost my account. Requesting password reset on an
> account that doesn't exist, results in 500 :p.
> Created a new one.
>
> However, my options for gdeploy version do not match the version 2.0.2
> installed. They only list 0.xx.x.
>
> On Wed, Feb 6, 2019 at 8:44 AM Greg Sheremeta  wrote:
>
>>
>> On Wed, Feb 6, 2019 at 11:31 AM feral  wrote:
>>
>>> Sorry, I tried opening a bug, but RH's Bugzilla is having issues and
>>> 500's out when I try to request password reset.
>>>
>>
>> Sorry to hear that. I hate to ask, but could you try after clearing
>> cookies for *redhat.com, or try in an incognito tab? It's done that to
>> me before and clearing cookies worked for me. If it still doesn't work,
>> I'll open the bug. Thanks!
>>
>>
>>>
>>>
>>> ---
>>> PLAY [gluster_servers]
>>> *
>>>
>>> TASK [Run a shell script]
>>> **
>>> changed: [ovirt-431.localdomain] =>
>>> (item=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d vdb -h
>>> ovirt-431.localdomain, ovirt-432.localdomain, ovirt-433.localdomain)
>>>
>>> PLAY RECAP
>>> *
>>> ovirt-431.localdomain  : ok=1changed=1unreachable=0
>>> failed=0
>>>
>>>
>>> PLAY [gluster_servers]
>>> *
>>>
>>> TASK [Run a shell script]
>>> **
>>> changed: [ovirt-432.localdomain] =>
>>> (item=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d vdb -h
>>> ovirt-431.localdomain, ovirt-432.localdomain, ovirt-433.localdomain)
>>>
>>> PLAY RECAP
>>> *
>>> ovirt-432.localdomain  : ok=1changed=1unreachable=0
>>> failed=0
>>>
>>>
>>> PLAY [gluster_servers]
>>> *
>>>
>>> TASK [Run a shell script]
>>> **
>>> changed: [ovirt-433.localdomain] =>
>>> (item=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d vdb -h
>>> ovirt-431.localdomain, ovirt-432.localdomain, ovirt-433.localdomain)
>>>
>>> PLAY RECAP
>>> *
>>> ovirt-433.localdomain  : ok=1changed=1unreachable=0
>>> failed=0
>>>
>>>
>>> PLAY [gluster_servers]
>>> *
>>>
>>> TASK [Create VDO with specified size]
>>> **
>>> changed: [ovirt-431.localdomain] => (item={u'disk': u'/dev/vdb',
>>> u'logicalsize': u'11000G', u'name': u'vdo_vdb'})
>>>
>>> PLAY RECAP
>>> *
>>> ovirt-431.localdomain  : ok=1changed=1unreachable=0
>>> failed=0
>>>
>>>
>>> PLAY [gluster_servers]
>>> *
>>>
>>> TASK [Create VDO with specified size]
>>> **
>>> changed: [ovirt-432.localdomain] => (item={u'disk': u'/dev/vdb',
>>> u'logicalsize': u'11000G', u'name': u'vdo_vdb'})
>>>
>>> PLAY RECAP
>>> *
>>> ovirt-432.localdomain  : ok=1changed=1unreachable=0
>>> failed=0
>>>
>>>
>>> PLAY [gluster_servers]
>>> *
>>>
>>> TASK [Create VDO with specified size]
>>> **
>>> changed: [ovirt-433.localdomain] => (item={u'disk'

[ovirt-users] Re: ovirt-4.3 hyperconverged deployment - no option for "disc count" for JBOD

2019-02-06 Thread feral
Bugzilla must have lost my account. Requesting password reset on an account
that doesn't exist, results in 500 :p.
Created a new one.

However, my options for gdeploy version do not match the version 2.0.2
installed. They only list 0.xx.x.

On Wed, Feb 6, 2019 at 8:44 AM Greg Sheremeta  wrote:

>
> On Wed, Feb 6, 2019 at 11:31 AM feral  wrote:
>
>> Sorry, I tried opening a bug, but RH's Bugzilla is having issues and
>> 500's out when I try to request password reset.
>>
>
> Sorry to hear that. I hate to ask, but could you try after clearing
> cookies for *redhat.com, or try in an incognito tab? It's done that to me
> before and clearing cookies worked for me. If it still doesn't work, I'll
> open the bug. Thanks!
>
>
>>
>>
>> ---
>> PLAY [gluster_servers]
>> *
>>
>> TASK [Run a shell script]
>> **
>> changed: [ovirt-431.localdomain] =>
>> (item=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d vdb -h
>> ovirt-431.localdomain, ovirt-432.localdomain, ovirt-433.localdomain)
>>
>> PLAY RECAP
>> *
>> ovirt-431.localdomain  : ok=1changed=1unreachable=0
>> failed=0
>>
>>
>> PLAY [gluster_servers]
>> *
>>
>> TASK [Run a shell script]
>> **
>> changed: [ovirt-432.localdomain] =>
>> (item=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d vdb -h
>> ovirt-431.localdomain, ovirt-432.localdomain, ovirt-433.localdomain)
>>
>> PLAY RECAP
>> *
>> ovirt-432.localdomain  : ok=1changed=1unreachable=0
>> failed=0
>>
>>
>> PLAY [gluster_servers]
>> *
>>
>> TASK [Run a shell script]
>> **
>> changed: [ovirt-433.localdomain] =>
>> (item=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d vdb -h
>> ovirt-431.localdomain, ovirt-432.localdomain, ovirt-433.localdomain)
>>
>> PLAY RECAP
>> *
>> ovirt-433.localdomain  : ok=1changed=1unreachable=0
>> failed=0
>>
>>
>> PLAY [gluster_servers]
>> *
>>
>> TASK [Create VDO with specified size]
>> **
>> changed: [ovirt-431.localdomain] => (item={u'disk': u'/dev/vdb',
>> u'logicalsize': u'11000G', u'name': u'vdo_vdb'})
>>
>> PLAY RECAP
>> *
>> ovirt-431.localdomain  : ok=1changed=1unreachable=0
>> failed=0
>>
>>
>> PLAY [gluster_servers]
>> *
>>
>> TASK [Create VDO with specified size]
>> **
>> changed: [ovirt-432.localdomain] => (item={u'disk': u'/dev/vdb',
>> u'logicalsize': u'11000G', u'name': u'vdo_vdb'})
>>
>> PLAY RECAP
>> *
>> ovirt-432.localdomain  : ok=1changed=1unreachable=0
>> failed=0
>>
>>
>> PLAY [gluster_servers]
>> *
>>
>> TASK [Create VDO with specified size]
>> **
>> changed: [ovirt-433.localdomain] => (item={u'disk': u'/dev/vdb',
>> u'logicalsize': u'11000G', u'name': u'vdo_vdb'})
>>
>> PLAY RECAP
>> *
>> ovirt-433.localdomain  : ok=1changed=1unreachable=0
>> failed=0
>>
>>
>> PLAY [gluster_servers]
>> *
>>
>> TASK [Enable or disable services]
>> **
>> ok: [ovirt-432.localdomain] => (item=chronyd)
>> ok: [ovirt-431.localdomain] => (item=chronyd)
>> ok: [ovirt-433.l

[ovirt-users] Re: ovirt-4.3 hyperconverged deployment - no option for "disc count" for JBOD

2019-02-06 Thread feral
in  : ok=1changed=1unreachable=0
failed=0
ovirt-433.localdomain  : ok=1changed=1unreachable=0
failed=0


PLAY [gluster_servers]
*

TASK [Run a shell script]
**
changed: [ovirt-431.localdomain] =>
(item=/usr/share/gdeploy/scripts/blacklist_all_disks.sh)
changed: [ovirt-432.localdomain] =>
(item=/usr/share/gdeploy/scripts/blacklist_all_disks.sh)
changed: [ovirt-433.localdomain] =>
(item=/usr/share/gdeploy/scripts/blacklist_all_disks.sh)

PLAY RECAP
*
ovirt-431.localdomain  : ok=1changed=1unreachable=0
failed=0
ovirt-432.localdomain  : ok=1changed=1unreachable=0
failed=0
ovirt-433.localdomain  : ok=1changed=1unreachable=0
failed=0


PLAY [gluster_servers]
*

TASK [Clean up filesystem signature]
***
skipping: [ovirt-431.localdomain] => (item=/dev/mapper/vdo_vdb)

TASK [Create Physical Volume]
**
changed: [ovirt-431.localdomain] => (item=/dev/mapper/vdo_vdb)

PLAY RECAP
*
ovirt-431.localdomain  : ok=1changed=1unreachable=0
failed=0


PLAY [gluster_servers]
*

TASK [Clean up filesystem signature]
***
skipping: [ovirt-432.localdomain] => (item=/dev/mapper/vdo_vdb)

TASK [Create Physical Volume]
**
changed: [ovirt-432.localdomain] => (item=/dev/mapper/vdo_vdb)

PLAY RECAP
*
ovirt-432.localdomain  : ok=1changed=1unreachable=0
failed=0


PLAY [gluster_servers]
*

TASK [Clean up filesystem signature]
***
skipping: [ovirt-433.localdomain] => (item=/dev/mapper/vdo_vdb)

TASK [Create Physical Volume]
**
changed: [ovirt-433.localdomain] => (item=/dev/mapper/vdo_vdb)

PLAY RECAP
*
ovirt-433.localdomain  : ok=1changed=1unreachable=0
failed=0


PLAY [gluster_servers]
*

TASK [Create volume group on the disks]

changed: [ovirt-431.localdomain] => (item={u'brick':
u'/dev/mapper/vdo_vdb', u'vg': u'gluster_vg_vdb'})

PLAY RECAP
*
ovirt-431.localdomain  : ok=1changed=1unreachable=0
failed=0


PLAY [gluster_servers]
*

TASK [Create volume group on the disks]

changed: [ovirt-432.localdomain] => (item={u'brick':
u'/dev/mapper/vdo_vdb', u'vg': u'gluster_vg_vdb'})

PLAY RECAP
*
ovirt-432.localdomain  : ok=1changed=1unreachable=0
failed=0


PLAY [gluster_servers]
*

TASK [Create volume group on the disks]

changed: [ovirt-433.localdomain] => (item={u'brick':
u'/dev/mapper/vdo_vdb', u'vg': u'gluster_vg_vdb'})

PLAY RECAP
*
ovirt-433.localdomain  : ok=1changed=1unreachable=0
failed=0

Error: Section diskcount not found in the configuration file
---

So this is just using the default gluster options, except with JBOD
selected.




On Wed, Feb 6, 2019 at 8:15 AM Greg Sheremeta  wrote:

> Please open a bug on this.
> https://bugzilla.redhat.com/enter_bug.cgi?product=cockpit-ovirt
> Component = Gdeploy
>
> Best wishes,
> Greg
>
>
> On Tue, Feb 5, 2019 at 3:20 PM feral  wrote:
>
>> Vanilla install of ovirt-node-4.3 (iso). During hyperconverged wizard,
>> there is no option for disc count for JBOD. This results in error during
>> deployment.
>> Disc Count option is available for all other raid levels.
>>
>> --
>> _
>> Fact:
>> 1. Ninjas are mammals.
>> 2. Ninjas fight ALL the time.
>> 3. The purpose of the ninja is to flip out and kill people.
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org

[ovirt-users] Re: rebooting an ovirt cluster

2019-02-05 Thread feral
Incidentally, at least so far, ovirt-node-4.3 is going much better (for the
installation anyway). The documentation for hyperconverged does not mention
anything about setting up cockpit though, so you have to manually enable
and start it, and if you're not using ipv6, you have to modify the
cockpit.socket to forcibly enable ipv4.


On Tue, Feb 5, 2019 at 8:27 AM Sahina Bose  wrote:

>
>
> On Tue, Feb 5, 2019 at 7:23 AM Greg Sheremeta  wrote:
>
>>
>>
>> On Mon, Feb 4, 2019 at 4:15 PM feral  wrote:
>>
>>> I think I found the answer to glusterd not starting.
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1472267
>>>
>>> Apparently the version of gluster (3.12.15) that comes packaged with
>>> ovirt-node 4.2.8 has a known issue where gluster tries to come up before
>>> networking, fails, and crashes. This was fixed in gluster 3.13.0
>>> (apparently). Do devs paruse this list?
>>>
>>
>> Yes :)
>>
>>
>>> Any chance someone who can update the gluster package might read this?
>>>
>>
>> +Sahina might be able to help
>> The developers list is
>> https://lists.ovirt.org/archives/list/de...@ovirt.org/
>>
>
> On 4.2, we're stuck with glusterfs 3.12 due to dependency on gluster-gnfs.
>
> The bug you refer to is hit only when a hostname changes or one of the
> network interfaces is down and brick path cannot be resolved. What's the
> error in glusterd.log for the failure to start?
>
>
>>
>>> On Mon, Feb 4, 2019 at 2:38 AM Simone Tiraboschi 
>>> wrote:
>>>
>>>>
>>>>
>>>> On Sat, Feb 2, 2019 at 7:32 PM feral  wrote:
>>>>
>>>>> How is an oVirt hyperconverged cluster supposed to come back to life
>>>>> after a power outage to all 3 nodes?
>>>>>
>>>>> Running ovirt-node (ovirt-node-ng-installer-4.2.0-2019013006.el7.iso)
>>>>> to get things going, but I've run into multiple issues.
>>>>>
>>>>> 1. During the gluster setup, the volume sizes I specify, are not
>>>>> reflected in the deployment configuration. The auto-populated values are
>>>>> used every time. I manually hacked on the config to get the volume sizes
>>>>> correct. I also noticed if I create the deployment config with "sdb" by
>>>>> accident, but click back and change it to "vdb", again, the changes are 
>>>>> not
>>>>> reflected in the config.
>>>>> My deployment config does seem to work. All volumes are created
>>>>> (though the xfs options used don't make sense as you end up with stripe
>>>>> sizes that aren't a multiple of the block size).
>>>>> Once gluster is deployed, I deploy the hosted engine, and everything
>>>>> works.
>>>>>
>>>>> 2. Reboot all nodes. I was testing for power outage response. All
>>>>> nodes come up, but glusterd is not running (seems to have failed for some
>>>>> reason). I can manually restart glusterd on all nodes and it comes up and
>>>>> starts communicating normally. However, the engine does not come online. 
>>>>> So
>>>>> I figure out where it last lived, and try to start it manually through the
>>>>> web interface. This fails because vdsm-ovirtmgmt is not up. I figured out
>>>>> the correct way to start up the engine would be through the cli via
>>>>> hosted-engine --vm-start.
>>>>>
>>>>
>>>> This is not required at all.
>>>> Are you sure that your cluster is not set in global maintenance mode?
>>>> Can you please share /var/log/ovirt-hosted-engine-ha/agent.log and
>>>> broker.log from your hosts?
>>>>
>>>>
>>>>> This does work, but it takes a very long time, and it usually starts
>>>>> up on any node other than the one I told it to start on.
>>>>>
>>>>> So I guess two (or three) questions. What is the expected operation
>>>>> after a full cluster reboot (ie: in the event of a power failure)? Why
>>>>> doesn't the engine start automatically, and what might be causing glusterd
>>>>> to fail, when it can be restarted manually and works fine?
>>>>>
>>>>> --
>>>>> _
>>>>> Fact:
>>>>> 1. Ninjas are mammals.
>>>>> 2. Ninjas fight ALL the time.
>>>>> 3. The purpose of the ninja 

[ovirt-users] ovirt-4.3 hyperconverged deployment - no option for "disc count" for JBOD

2019-02-05 Thread feral
Vanilla install of ovirt-node-4.3 (iso). During hyperconverged wizard,
there is no option for disc count for JBOD. This results in error during
deployment.
Disc Count option is available for all other raid levels.

-- 
_
Fact:
1. Ninjas are mammals.
2. Ninjas fight ALL the time.
3. The purpose of the ninja is to flip out and kill people.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7JLLJJM4D2RCOFQLKLW5AVJPUK24J3YY/


[ovirt-users] Re: ovirt-node 4.2 iso - hyperconverged wizard doesn't write gdeployConfig settings

2019-02-05 Thread feral
Using SystemD makes way more sense to me. I was just trying to use
ovirt-node as it was ... intended? Mainly because I have no idea how it all
works yet, so I've been trying to do the most stockish deployment possible,
following deployment instructions and not thinking I'm smarter than the
software :p.
I've given up on 4.2 for now, as 4.3 was just released, so giving that a
try now. Will report back. Hopefully 4.3 enlists systemd for stuff?

On Tue, Feb 5, 2019 at 4:33 AM Strahil Nikolov 
wrote:

> Dear Feral,
>
> >On that note, have you also had issues with gluster not restarting on
> reboot, as well as >all of the HA stuff failing on reboot after power loss?
> Thus far, the only way I've got the >cluster to come back to life, is to
> manually restart glusterd on all nodes, then put the >cluster back into
> "not mainentance" mode, and then manually starting the hosted-engine vm.
> >This also fails after 2 or 3 power losses, even though the entire cluster
> is happy through >the first 2.
>
>
> About the gluster not starting - use systemd.mount unit files.
> here is my setup and for now works:
>
> [root@ovirt2 yum.repos.d]# systemctl cat gluster_bricks-engine.mount
> # /etc/systemd/system/gluster_bricks-engine.mount
> [Unit]
> Description=Mount glusterfs brick - ENGINE
> Requires = vdo.service
> After = vdo.service
> Before = glusterd.service
> Conflicts = umount.target
>
> [Mount]
> What=/dev/mapper/gluster_vg_md0-gluster_lv_engine
> Where=/gluster_bricks/engine
> Type=xfs
> Options=inode64,noatime,nodiratime
>
> [Install]
> WantedBy=glusterd.service
> [root@ovirt2 yum.repos.d]# systemctl cat gluster_bricks-engine.automount
> # /etc/systemd/system/gluster_bricks-engine.automount
> [Unit]
> Description=automount for gluster brick ENGINE
>
> [Automount]
> Where=/gluster_bricks/engine
>
> [Install]
> WantedBy=multi-user.target
> [root@ovirt2 yum.repos.d]# systemctl cat glusterd
> # /etc/systemd/system/glusterd.service
> [Unit]
> Description=GlusterFS, a clustered file-system server
> Requires=rpcbind.service gluster_bricks-engine.mount
> gluster_bricks-data.mount gluster_bricks-isos.mount
> After=network.target rpcbind.service gluster_bricks-engine.mount
> gluster_bricks-data.mount gluster_bricks-isos.mount
> Before=network-online.target
>
> [Service]
> Type=forking
> PIDFile=/var/run/glusterd.pid
> LimitNOFILE=65536
> Environment="LOG_LEVEL=INFO"
> EnvironmentFile=-/etc/sysconfig/glusterd
> ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid  --log-level
> $LOG_LEVEL $GLUSTERD_OPTIONS
> KillMode=process
> SuccessExitStatus=15
>
> [Install]
> WantedBy=multi-user.target
>
> # /etc/systemd/system/glusterd.service.d/99-cpu.conf
> [Service]
> CPUAccounting=yes
> Slice=glusterfs.slice
>
>
> Best Regards,
> Strahil Nikolov
>


-- 
_
Fact:
1. Ninjas are mammals.
2. Ninjas fight ALL the time.
3. The purpose of the ninja is to flip out and kill people.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/G4AE6YQHYL7XBTYNCLQPFQY6CY6C7YGX/


[ovirt-users] Re: rebooting an ovirt cluster

2019-02-05 Thread feral
ha! I was all worried I was completely tarding out as I've never heard of
4.3... Released two days ago...

On Mon, Feb 4, 2019 at 11:53 PM Sandro Bonazzola 
wrote:

>
>
> Il giorno mar 5 feb 2019 alle ore 02:53 Greg Sheremeta <
> gsher...@redhat.com> ha scritto:
>
>>
>>
>> On Mon, Feb 4, 2019 at 4:15 PM feral  wrote:
>>
>>> I think I found the answer to glusterd not starting.
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1472267
>>>
>>> Apparently the version of gluster (3.12.15) that comes packaged with
>>> ovirt-node 4.2.8 has a known issue where gluster tries to come up before
>>> networking, fails, and crashes. This was fixed in gluster 3.13.0
>>> (apparently).
>>>
>>
> May I suggest to upgrade to 4.3.0? It ships Gluster 5 which should include
> the needed fixes.
>
>
>
>> Do devs paruse this list?
>>>
>>
>> Yes :)
>>
>>
>>> Any chance someone who can update the gluster package might read this?
>>>
>>
>> +Sahina might be able to help
>> The developers list is
>> https://lists.ovirt.org/archives/list/de...@ovirt.org/
>>
>>
>>> On Mon, Feb 4, 2019 at 2:38 AM Simone Tiraboschi 
>>> wrote:
>>>
>>>>
>>>>
>>>> On Sat, Feb 2, 2019 at 7:32 PM feral  wrote:
>>>>
>>>>> How is an oVirt hyperconverged cluster supposed to come back to life
>>>>> after a power outage to all 3 nodes?
>>>>>
>>>>> Running ovirt-node (ovirt-node-ng-installer-4.2.0-2019013006.el7.iso)
>>>>> to get things going, but I've run into multiple issues.
>>>>>
>>>>> 1. During the gluster setup, the volume sizes I specify, are not
>>>>> reflected in the deployment configuration. The auto-populated values are
>>>>> used every time. I manually hacked on the config to get the volume sizes
>>>>> correct. I also noticed if I create the deployment config with "sdb" by
>>>>> accident, but click back and change it to "vdb", again, the changes are 
>>>>> not
>>>>> reflected in the config.
>>>>> My deployment config does seem to work. All volumes are created
>>>>> (though the xfs options used don't make sense as you end up with stripe
>>>>> sizes that aren't a multiple of the block size).
>>>>> Once gluster is deployed, I deploy the hosted engine, and everything
>>>>> works.
>>>>>
>>>>> 2. Reboot all nodes. I was testing for power outage response. All
>>>>> nodes come up, but glusterd is not running (seems to have failed for some
>>>>> reason). I can manually restart glusterd on all nodes and it comes up and
>>>>> starts communicating normally. However, the engine does not come online. 
>>>>> So
>>>>> I figure out where it last lived, and try to start it manually through the
>>>>> web interface. This fails because vdsm-ovirtmgmt is not up. I figured out
>>>>> the correct way to start up the engine would be through the cli via
>>>>> hosted-engine --vm-start.
>>>>>
>>>>
>>>> This is not required at all.
>>>> Are you sure that your cluster is not set in global maintenance mode?
>>>> Can you please share /var/log/ovirt-hosted-engine-ha/agent.log and
>>>> broker.log from your hosts?
>>>>
>>>>
>>>>> This does work, but it takes a very long time, and it usually starts
>>>>> up on any node other than the one I told it to start on.
>>>>>
>>>>> So I guess two (or three) questions. What is the expected operation
>>>>> after a full cluster reboot (ie: in the event of a power failure)? Why
>>>>> doesn't the engine start automatically, and what might be causing glusterd
>>>>> to fail, when it can be restarted manually and works fine?
>>>>>
>>>>> --
>>>>> _
>>>>> Fact:
>>>>> 1. Ninjas are mammals.
>>>>> 2. Ninjas fight ALL the time.
>>>>> 3. The purpose of the ninja is to flip out and kill people.
>>>>> ___
>>>>> Users mailing list -- users@ovirt.org
>>>>> To unsubscribe send an email to users-le...@ovirt.org
>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>> oVir

[ovirt-users] Re: ovirt-node 4.2 iso - hyperconverged wizard doesn't write gdeployConfig settings

2019-02-04 Thread feral
Fyi, this is just a vanilla install from the ovirt node 4.2 iso. Install 3
nodes, sync up hosts file and exchange SSH keys, and hit the webui for
hyperconverged deployment. The only setting I enter that make it into the
config, are the hostnames.

On Mon, Feb 4, 2019, 8:44 PM Gobinda Das  Sure Greg, I will look into this and get back to you guys.
>
> On Tue, Feb 5, 2019 at 7:22 AM Greg Sheremeta  wrote:
>
>> Sahina, Gobinda,
>>
>> Can you check this thread?
>>
>> On Mon, Feb 4, 2019 at 6:02 PM feral  wrote:
>>
>>> Glusterd was enabled, just crashes on boot. It's a known issue that was
>>> resolved in 3.13, but ovirt-node only has 3.12.
>>> The VM is at that point, paused. So I manually startup glusterd again
>>> and ensure all nodes are online, and then resume the hosted engine.
>>> Sometimes it works, sometimes not.
>>>
>>> I think the issue here is that there are multiple issues with the
>>> current ovirt-node release iso. I was able to get everything working with
>>> Centos base and installing ovirt manually. Still had the same problem with
>>> the gluster wizard not using any of my settings, but after that, and
>>> ensuring i restart all services after a reboot, things came to life.
>>> Trying to discuss with devs, but so far no luck. I keep hearing that the
>>> previous release of ovirt-node (iso) was just much smoother, but haven't
>>> seen anyone addressing the issues in current release.
>>>
>>>
>>> On Mon, Feb 4, 2019 at 2:16 PM Edward Berger 
>>> wrote:
>>>
>>>> On each host you should check if systemctl status glusterd shows
>>>> "enabled" and whatever is the gluster events daemon. (I'm not logged in to
>>>> look right now)
>>>>
>>>> I'm not sure which part of gluster-wizard or hosted-engine engine
>>>> installation is supposed to do the enabling, but I've seen where incomplete
>>>> installs left it disabled.
>>>>
>>>> If the gluster servers haven't come up properly then there's no working
>>>> image for engine.
>>>> I had a situation where it was in a "paused" state and I had to run
>>>> "hosted-engine --vm-status" on possible nodes to find which one has VM in
>>>> paused state
>>>> then log into that node and run this command..
>>>>
>>>> virsh -c
>>>> qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf resume
>>>> HostedEngine
>>>>
>>>>
>>>> On Mon, Feb 4, 2019 at 3:23 PM feral  wrote:
>>>>
>>>>> On that note, have you also had issues with gluster not restarting on
>>>>> reboot, as well as all of the HA stuff failing on reboot after power loss?
>>>>> Thus far, the only way I've got the cluster to come back to life, is to
>>>>> manually restart glusterd on all nodes, then put the cluster back into 
>>>>> "not
>>>>> mainentance" mode, and then manually starting the hosted-engine vm. This
>>>>> also fails after 2 or 3 power losses, even though the entire cluster is
>>>>> happy through the first 2.
>>>>>
>>>>> On Mon, Feb 4, 2019 at 12:21 PM feral  wrote:
>>>>>
>>>>>> Yea, I've been able to build a config manually myself, but sure would
>>>>>> be nice if the gdeploy worked (at all), as it takes an hour to deploy 
>>>>>> every
>>>>>> test, and manually creating the conf, I have to be super conservative 
>>>>>> about
>>>>>> my sizes, as I'm still not entirely sure what the deploy script actually
>>>>>> does. IE: I've got 3 nodes with 1.2TB for the gluster each, but if I try 
>>>>>> to
>>>>>> build a deployment to make use of more than 900GB, it fails as it's
>>>>>> creating the thinpool with whatever size it wants.
>>>>>>
>>>>>> Just wanted to make sure I wasn't the only one having this issue.
>>>>>> Given we know at least two people have noticed, who's the best to 
>>>>>> contact?
>>>>>> I haven't been able to get any response from devs on any of (the myriad)
>>>>>> of issues with the 4.2.8 image.
>>>>>>
>>>>>
>> Have you reported bugs?
>> https://bugzilla.redhat.c

[ovirt-users] Re: ovirt-node 4.2 iso - hyperconverged wizard doesn't write gdeployConfig settings

2019-02-04 Thread feral
Glusterd was enabled, just crashes on boot. It's a known issue that was
resolved in 3.13, but ovirt-node only has 3.12.
The VM is at that point, paused. So I manually startup glusterd again and
ensure all nodes are online, and then resume the hosted engine. Sometimes
it works, sometimes not.

I think the issue here is that there are multiple issues with the current
ovirt-node release iso. I was able to get everything working with Centos
base and installing ovirt manually. Still had the same problem with the
gluster wizard not using any of my settings, but after that, and ensuring i
restart all services after a reboot, things came to life.
Trying to discuss with devs, but so far no luck. I keep hearing that the
previous release of ovirt-node (iso) was just much smoother, but haven't
seen anyone addressing the issues in current release.


On Mon, Feb 4, 2019 at 2:16 PM Edward Berger  wrote:

> On each host you should check if systemctl status glusterd shows "enabled"
> and whatever is the gluster events daemon. (I'm not logged in to look right
> now)
>
> I'm not sure which part of gluster-wizard or hosted-engine engine
> installation is supposed to do the enabling, but I've seen where incomplete
> installs left it disabled.
>
> If the gluster servers haven't come up properly then there's no working
> image for engine.
> I had a situation where it was in a "paused" state and I had to run
> "hosted-engine --vm-status" on possible nodes to find which one has VM in
> paused state
> then log into that node and run this command..
>
> virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf
> resume HostedEngine
>
>
> On Mon, Feb 4, 2019 at 3:23 PM feral  wrote:
>
>> On that note, have you also had issues with gluster not restarting on
>> reboot, as well as all of the HA stuff failing on reboot after power loss?
>> Thus far, the only way I've got the cluster to come back to life, is to
>> manually restart glusterd on all nodes, then put the cluster back into "not
>> mainentance" mode, and then manually starting the hosted-engine vm. This
>> also fails after 2 or 3 power losses, even though the entire cluster is
>> happy through the first 2.
>>
>> On Mon, Feb 4, 2019 at 12:21 PM feral  wrote:
>>
>>> Yea, I've been able to build a config manually myself, but sure would be
>>> nice if the gdeploy worked (at all), as it takes an hour to deploy every
>>> test, and manually creating the conf, I have to be super conservative about
>>> my sizes, as I'm still not entirely sure what the deploy script actually
>>> does. IE: I've got 3 nodes with 1.2TB for the gluster each, but if I try to
>>> build a deployment to make use of more than 900GB, it fails as it's
>>> creating the thinpool with whatever size it wants.
>>>
>>> Just wanted to make sure I wasn't the only one having this issue. Given
>>> we know at least two people have noticed, who's the best to contact? I
>>> haven't been able to get any response from devs on any of (the myriad)  of
>>> issues with the 4.2.8 image.
>>> Also having a ton of strange issues with the hosted-engine vm deployment.
>>>
>>> On Mon, Feb 4, 2019 at 11:59 AM Edward Berger 
>>> wrote:
>>>
>>>> Yes, I had that issue with an 4.2.8 installation.
>>>> I had to manually edit the "web-UI-generated" config to be anywhere
>>>> close to what I wanted.
>>>>
>>>> I'll attach an edited config as an example.
>>>>
>>>> On Mon, Feb 4, 2019 at 2:51 PM feral  wrote:
>>>>
>>>>> New install of ovirt-node 4.2 (from iso). Setup each node with
>>>>> networking and ssh keys, and use the hyperconverged gluster deployment
>>>>> wizard. None of the user specified settings are ever reflected in the
>>>>> gdeployConfig.conf.
>>>>> Anyone running into this?
>>>>>
>>>>> --
>>>>> _
>>>>> Fact:
>>>>> 1. Ninjas are mammals.
>>>>> 2. Ninjas fight ALL the time.
>>>>> 3. The purpose of the ninja is to flip out and kill people.
>>>>> ___
>>>>> Users mailing list -- users@ovirt.org
>>>>> To unsubscribe send an email to users-le...@ovirt.org
>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>> oVirt Code of Conduct:
>>>>> https://www.ovirt.org/community/about/community-guidelines/
>>&

[ovirt-users] Re: rebooting an ovirt cluster

2019-02-04 Thread feral
I think I found the answer to glusterd not starting.
https://bugzilla.redhat.com/show_bug.cgi?id=1472267

Apparently the version of gluster (3.12.15) that comes packaged with
ovirt-node 4.2.8 has a known issue where gluster tries to come up before
networking, fails, and crashes. This was fixed in gluster 3.13.0
(apparently). Do devs paruse this list? Any chance someone who can update
the gluster package might read this?

On Mon, Feb 4, 2019 at 2:38 AM Simone Tiraboschi 
wrote:

>
>
> On Sat, Feb 2, 2019 at 7:32 PM feral  wrote:
>
>> How is an oVirt hyperconverged cluster supposed to come back to life
>> after a power outage to all 3 nodes?
>>
>> Running ovirt-node (ovirt-node-ng-installer-4.2.0-2019013006.el7.iso) to
>> get things going, but I've run into multiple issues.
>>
>> 1. During the gluster setup, the volume sizes I specify, are not
>> reflected in the deployment configuration. The auto-populated values are
>> used every time. I manually hacked on the config to get the volume sizes
>> correct. I also noticed if I create the deployment config with "sdb" by
>> accident, but click back and change it to "vdb", again, the changes are not
>> reflected in the config.
>> My deployment config does seem to work. All volumes are created (though
>> the xfs options used don't make sense as you end up with stripe sizes that
>> aren't a multiple of the block size).
>> Once gluster is deployed, I deploy the hosted engine, and everything
>> works.
>>
>> 2. Reboot all nodes. I was testing for power outage response. All nodes
>> come up, but glusterd is not running (seems to have failed for some
>> reason). I can manually restart glusterd on all nodes and it comes up and
>> starts communicating normally. However, the engine does not come online. So
>> I figure out where it last lived, and try to start it manually through the
>> web interface. This fails because vdsm-ovirtmgmt is not up. I figured out
>> the correct way to start up the engine would be through the cli via
>> hosted-engine --vm-start.
>>
>
> This is not required at all.
> Are you sure that your cluster is not set in global maintenance mode?
> Can you please share /var/log/ovirt-hosted-engine-ha/agent.log and
> broker.log from your hosts?
>
>
>> This does work, but it takes a very long time, and it usually starts up
>> on any node other than the one I told it to start on.
>>
>> So I guess two (or three) questions. What is the expected operation after
>> a full cluster reboot (ie: in the event of a power failure)? Why doesn't
>> the engine start automatically, and what might be causing glusterd to fail,
>> when it can be restarted manually and works fine?
>>
>> --
>> _
>> Fact:
>> 1. Ninjas are mammals.
>> 2. Ninjas fight ALL the time.
>> 3. The purpose of the ninja is to flip out and kill people.
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/RIADNRZRXTPTRG4XBFUMNWASBWRFCG4V/
>>
>

-- 
_
Fact:
1. Ninjas are mammals.
2. Ninjas fight ALL the time.
3. The purpose of the ninja is to flip out and kill people.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WNN3PJFFP4VU5YAPDNYC7WQOTBDXDKPC/


[ovirt-users] Re: ovirt-node 4.2 iso - hyperconverged wizard doesn't write gdeployConfig settings

2019-02-04 Thread feral
On that note, have you also had issues with gluster not restarting on
reboot, as well as all of the HA stuff failing on reboot after power loss?
Thus far, the only way I've got the cluster to come back to life, is to
manually restart glusterd on all nodes, then put the cluster back into "not
mainentance" mode, and then manually starting the hosted-engine vm. This
also fails after 2 or 3 power losses, even though the entire cluster is
happy through the first 2.

On Mon, Feb 4, 2019 at 12:21 PM feral  wrote:

> Yea, I've been able to build a config manually myself, but sure would be
> nice if the gdeploy worked (at all), as it takes an hour to deploy every
> test, and manually creating the conf, I have to be super conservative about
> my sizes, as I'm still not entirely sure what the deploy script actually
> does. IE: I've got 3 nodes with 1.2TB for the gluster each, but if I try to
> build a deployment to make use of more than 900GB, it fails as it's
> creating the thinpool with whatever size it wants.
>
> Just wanted to make sure I wasn't the only one having this issue. Given we
> know at least two people have noticed, who's the best to contact? I haven't
> been able to get any response from devs on any of (the myriad)  of issues
> with the 4.2.8 image.
> Also having a ton of strange issues with the hosted-engine vm deployment.
>
> On Mon, Feb 4, 2019 at 11:59 AM Edward Berger  wrote:
>
>> Yes, I had that issue with an 4.2.8 installation.
>> I had to manually edit the "web-UI-generated" config to be anywhere close
>> to what I wanted.
>>
>> I'll attach an edited config as an example.
>>
>> On Mon, Feb 4, 2019 at 2:51 PM feral  wrote:
>>
>>> New install of ovirt-node 4.2 (from iso). Setup each node with
>>> networking and ssh keys, and use the hyperconverged gluster deployment
>>> wizard. None of the user specified settings are ever reflected in the
>>> gdeployConfig.conf.
>>> Anyone running into this?
>>>
>>> --
>>> _
>>> Fact:
>>> 1. Ninjas are mammals.
>>> 2. Ninjas fight ALL the time.
>>> 3. The purpose of the ninja is to flip out and kill people.
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TF56FSFRNGCWEM4VJFOGKKAELJ3ID7NR/
>>>
>>
>
> --
> _
> Fact:
> 1. Ninjas are mammals.
> 2. Ninjas fight ALL the time.
> 3. The purpose of the ninja is to flip out and kill people.
>


-- 
_
Fact:
1. Ninjas are mammals.
2. Ninjas fight ALL the time.
3. The purpose of the ninja is to flip out and kill people.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/X4VYG2MCSJWCRDA4MGN27CICTTYDS7ZH/


[ovirt-users] Re: ovirt-node 4.2 iso - hyperconverged wizard doesn't write gdeployConfig settings

2019-02-04 Thread feral
Yea, I've been able to build a config manually myself, but sure would be
nice if the gdeploy worked (at all), as it takes an hour to deploy every
test, and manually creating the conf, I have to be super conservative about
my sizes, as I'm still not entirely sure what the deploy script actually
does. IE: I've got 3 nodes with 1.2TB for the gluster each, but if I try to
build a deployment to make use of more than 900GB, it fails as it's
creating the thinpool with whatever size it wants.

Just wanted to make sure I wasn't the only one having this issue. Given we
know at least two people have noticed, who's the best to contact? I haven't
been able to get any response from devs on any of (the myriad)  of issues
with the 4.2.8 image.
Also having a ton of strange issues with the hosted-engine vm deployment.

On Mon, Feb 4, 2019 at 11:59 AM Edward Berger  wrote:

> Yes, I had that issue with an 4.2.8 installation.
> I had to manually edit the "web-UI-generated" config to be anywhere close
> to what I wanted.
>
> I'll attach an edited config as an example.
>
> On Mon, Feb 4, 2019 at 2:51 PM feral  wrote:
>
>> New install of ovirt-node 4.2 (from iso). Setup each node with networking
>> and ssh keys, and use the hyperconverged gluster deployment wizard. None of
>> the user specified settings are ever reflected in the gdeployConfig.conf.
>> Anyone running into this?
>>
>> --
>> _
>> Fact:
>> 1. Ninjas are mammals.
>> 2. Ninjas fight ALL the time.
>> 3. The purpose of the ninja is to flip out and kill people.
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TF56FSFRNGCWEM4VJFOGKKAELJ3ID7NR/
>>
>

-- 
_
Fact:
1. Ninjas are mammals.
2. Ninjas fight ALL the time.
3. The purpose of the ninja is to flip out and kill people.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/K2GZOMMG3SIVKOPT6O6ZCRYNHTRA6KCB/


[ovirt-users] ovirt-node 4.2 iso - hyperconverged wizard doesn't write gdeployConfig settings

2019-02-04 Thread feral
New install of ovirt-node 4.2 (from iso). Setup each node with networking
and ssh keys, and use the hyperconverged gluster deployment wizard. None of
the user specified settings are ever reflected in the gdeployConfig.conf.
Anyone running into this?

-- 
_
Fact:
1. Ninjas are mammals.
2. Ninjas fight ALL the time.
3. The purpose of the ninja is to flip out and kill people.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TF56FSFRNGCWEM4VJFOGKKAELJ3ID7NR/


[ovirt-users] Re: rebooting an ovirt cluster

2019-02-03 Thread feral
So why is this the default behavior of the ovirt Node distro?

On Sun, Feb 3, 2019 at 5:16 PM Strahil  wrote:

>
> 2. Reboot all nodes. I was testing for power outage response. All nodes
> come up, but glusterd is not running (seems to have failed for some
> reason). I can manually restart glusterd on all nodes and it comes up and
> starts communicating normally. However, the engine does not come online. So
> I figure out where it last lived, and try to start it manually through the
> web interface. This fails because vdsm-ovirtmgmt is not up. I figured out
> the correct way to start up the engine would be through the cli via
> hosted-engine --vm-start. This does work, but it takes a very long time,
> and it usually starts up on any node other than the one I told it to start
> on.
>
> If you use fstab - prepare for pain... Systemd mounts are more effective.
> Here is a sample:
>
> [root@ovirt1 ~]# systemctl cat gluster_bricks-engine.mount
> # /etc/systemd/system/gluster_bricks-engine.mount
> [Unit]
> Description=Mount glusterfs brick - ENGINE
> Requires = vdo.service
> After = vdo.service
> Before = glusterd.service
> Conflicts = umount.target
>
> [Mount]
> What=/dev/mapper/gluster_vg_md0-gluster_lv_engine
> Where=/gluster_bricks/engine
> Type=xfs
> Options=inode64,noatime,nodiratime
>
> [Install]
> WantedBy=glusterd.service
>
> [root@ovirt1 ~]# systemctl cat glusterd.service
> # /etc/systemd/system/glusterd.service
> [Unit]
> Description=GlusterFS, a clustered file-system server
> Requires=rpcbind.service gluster_bricks-engine.mount
> gluster_bricks-data.mount
> After=network.target rpcbind.service gluster_bricks-engine.mount
> Before=network-online.target
>
> [Service]
> Type=forking
> PIDFile=/var/run/glusterd.pid
> LimitNOFILE=65536
> Environment="LOG_LEVEL=INFO"
> EnvironmentFile=-/etc/sysconfig/glusterd
> ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid  --log-level
> $LOG_LEVELKillMode=process
> SuccessExitStatus=15
>
> [Install]
> WantedBy=multi-user.target
>
> # /etc/systemd/system/glusterd.service.d/99-cpu.conf
> [Service]
> CPUAccounting=yes
> Slice=glusterfs.slice
>
>
> Note : Some of the 'After=' and 'Requires='  entries were removed during
> copy-pasting.
>
> So I guess two (or three) questions. What is the expected operation after
> a full cluster reboot (ie: in the event of a power failure)? Why doesn't
> the engine start automatically, and what might be causing glusterd to fail,
> when it can be restarted manually and works fine?
>
>
>
> Expected -everything to be up and running.
> Root cause , the system's fstab generator starts after cluster tries to
> start the bricks - and of course fails.
> Then everything on the chain fails.
>
> Just use systemd's mount entries ( I have added automount also)  and you
> won't have such issues.
>
> Best Regards,
> Strahil Nikolov
>


-- 
_
Fact:
1. Ninjas are mammals.
2. Ninjas fight ALL the time.
3. The purpose of the ninja is to flip out and kill people.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZHCHWBBC2MB2UZYWIZH7UF2IQZOUKA7Q/


[ovirt-users] rebooting an ovirt cluster

2019-02-02 Thread feral
How is an oVirt hyperconverged cluster supposed to come back to life after
a power outage to all 3 nodes?

Running ovirt-node (ovirt-node-ng-installer-4.2.0-2019013006.el7.iso) to
get things going, but I've run into multiple issues.

1. During the gluster setup, the volume sizes I specify, are not reflected
in the deployment configuration. The auto-populated values are used every
time. I manually hacked on the config to get the volume sizes correct. I
also noticed if I create the deployment config with "sdb" by accident, but
click back and change it to "vdb", again, the changes are not reflected in
the config.
My deployment config does seem to work. All volumes are created (though the
xfs options used don't make sense as you end up with stripe sizes that
aren't a multiple of the block size).
Once gluster is deployed, I deploy the hosted engine, and everything works.

2. Reboot all nodes. I was testing for power outage response. All nodes
come up, but glusterd is not running (seems to have failed for some
reason). I can manually restart glusterd on all nodes and it comes up and
starts communicating normally. However, the engine does not come online. So
I figure out where it last lived, and try to start it manually through the
web interface. This fails because vdsm-ovirtmgmt is not up. I figured out
the correct way to start up the engine would be through the cli via
hosted-engine --vm-start. This does work, but it takes a very long time,
and it usually starts up on any node other than the one I told it to start
on.

So I guess two (or three) questions. What is the expected operation after a
full cluster reboot (ie: in the event of a power failure)? Why doesn't the
engine start automatically, and what might be causing glusterd to fail,
when it can be restarted manually and works fine?

-- 
_
Fact:
1. Ninjas are mammals.
2. Ninjas fight ALL the time.
3. The purpose of the ninja is to flip out and kill people.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RIADNRZRXTPTRG4XBFUMNWASBWRFCG4V/