[ovirt-users] Re: Recovery from power outage

2021-02-04 Thread Roderick Mooi

Thanks so much, this worked!

For the record/list benefit, I first put the host into maintenance and then 
selected Enroll Certificate - this regenerated the certs.
(VDSM cert can be checked with: certtool -i --infile 
/etc/pki/vdsm/certs/vdsmcert.pem)

I then took these steps on the affected (incorrectly reported) host to update 
the hosted-engine --vm-status:
1. hosted-engine --set-maintenance --mode=global
2. systemctl stop ovirt-ha-agent.service
3. hosted-engine --clean-metadata
4. systemctl start ovirt-ha-agent.service
5. hosted-engine --vm-status (after a minute or two - to verify that the host 
details are now correct)

Cheers :)

On 2021/02/04 10:17, Yedidyah Bar David wrote:

On Thu, Feb 4, 2021 at 10:09 AM Roderick Mooi  wrote:


Hi Didi!

Ok, I started the clean metadata process and then found the real issue - I had 
copied the certs (just /etc/pki/vdsm; other pki folders were intact) from a 
working host (host 2) to host 1 following the re-deploy cleanup as part of the 
process to get it online again. The problem is the cert contains the hostname 
(so now the cert on host 1 contains as Subject CN the hostname of host 2).


Right. Sorry I didn't remember that.


I found some docs on the certs for libvirt but it's not clear what I need to do 
to correctly re-generate the vdsm certs on host 1. Can you help? PS I presume I 
need to re-generate client certs for that host as well and copy to the engine?


Easiest is to put the host to maintenance, then "Enroll Certificate" -
IIRC this should be enough. If you want to make sure, perhaps better
remove all certs/keys and do 'Reinstall' instead, and make sure you
choose 'Deploy' for 'Hosted Engine'.

Good luck,



Appreciated,

Roderick


On 2021/02/03 16:58, Yedidyah Bar David wrote:

On Wed, Feb 3, 2021 at 4:52 PM Roderick Mooi  wrote:


Thanks,


I didn't check, but am pretty certain that it's not related to the
engine db. Do you see such duplicates there as well (using the web ui
or sql against it)? If so, fix these first. If no other means, put the
host to maintenance and reinstall with the correct name.


Not seeing duplicates in the web UI, only in the --vm-status. Can you please 
assist me with the sql commands or reference to the database schema + where to 
check? I'd like to check that first before doing anything too drastic.


/usr/share/ovirt-engine/dbscripts/engine-psql.sh -c 'select * from vds'



Note: it only duplicated the hostname after I changed the host_id, before that 
it had the correct hostname but duplicate host_id.

PS I have a recent backup of the database (just before which I could restore if 
you think that'll do the trick without breaking anything?


On 2021/02/03 16:33, Yedidyah Bar David wrote:

On Wed, Feb 3, 2021 at 4:21 PM Roderick Mooi  wrote:


Hi,


Any idea how this happened?


Somehow related to the power being "pulled" at the wrong time?


Perhaps this is a backup done by emacs?


Not sure what does it but I'm glad it did ;)


Please compare it to your other hosts. It should be (mostly?)
identical, but make sure that host_id= is unique per host. It should
match the spm host id for this host in the engine database.


I had to restore one of my hosts (host 1) manually due a cleanup during my 
re-deploy attempts. I managed to do this successfully by copying the missing 
files from another host (host 2) but the first time the host ID matched one of 
the other hosts (which made at least hosted-engine --vm-status unhappy) [I 
hadn't seen your email yet :(]. I subsequently corrected the host_id and 
rebooted the guilty host. Things mostly seem to be working now except that in 
hosted-engine --vm-status my first two hosts (the one I copied the .conf from 
as well as the one I copied it to [without changing the ID :O]) now have the 
same hostname :-/ I'm assuming there's a mismatch in the engine database - 
where/how do I fix that?



I didn't check, but am pretty certain that it's not related to the
engine db. Do you see such duplicates there as well (using the web ui
or sql against it)? If so, fix these first. If no other means, put the
host to maintenance and reinstall with the correct name.

If it's just the shared storage, you can try the following. Carefully.
Didn't try myself. Try on a test system first.

1. Set global maintenance

2. Stop ovirt-ha-agent, ovirt-ha-broker, perhaps also vdsmd, supervdsmd

3. hosted-engine --clean_metadata --host-id=1

- Perhaps even pass --force-cleanup, not sure when it's needed

- Repeat for other IDs as needed

4. Start ovirt-ha-agent (I think this should start all the others, but
make sure)

5. Wait a bit. I am pretty certain that they should recreate their
entries in the shared storage and eventually --vm-status should look
ok.

6. Exit global maintenance

Good luck,


Appreciated! (and happy cos our cluster is almost back to normal :) )

On 2021/02/03 11:30, Yedidyah Bar David wrote:

On Wed, Feb 3, 2021 at 11:12 AM Roderick Mooi  wrote:


Hello and than

[ovirt-users] Re: Recovery from power outage

2021-02-04 Thread Roderick Mooi

Hi Didi!

Ok, I started the clean metadata process and then found the real issue - I had 
copied the certs (just /etc/pki/vdsm; other pki folders were intact) from a 
working host (host 2) to host 1 following the re-deploy cleanup as part of the 
process to get it online again. The problem is the cert contains the hostname 
(so now the cert on host 1 contains as Subject CN the hostname of host 2). I 
found some docs on the certs for libvirt but it's not clear what I need to do 
to correctly re-generate the vdsm certs on host 1. Can you help? PS I presume I 
need to re-generate client certs for that host as well and copy to the engine?

Appreciated,

Roderick


On 2021/02/03 16:58, Yedidyah Bar David wrote:

On Wed, Feb 3, 2021 at 4:52 PM Roderick Mooi  wrote:


Thanks,


I didn't check, but am pretty certain that it's not related to the
engine db. Do you see such duplicates there as well (using the web ui
or sql against it)? If so, fix these first. If no other means, put the
host to maintenance and reinstall with the correct name.


Not seeing duplicates in the web UI, only in the --vm-status. Can you please 
assist me with the sql commands or reference to the database schema + where to 
check? I'd like to check that first before doing anything too drastic.


/usr/share/ovirt-engine/dbscripts/engine-psql.sh -c 'select * from vds'



Note: it only duplicated the hostname after I changed the host_id, before that 
it had the correct hostname but duplicate host_id.

PS I have a recent backup of the database (just before which I could restore if 
you think that'll do the trick without breaking anything?


On 2021/02/03 16:33, Yedidyah Bar David wrote:

On Wed, Feb 3, 2021 at 4:21 PM Roderick Mooi  wrote:


Hi,


Any idea how this happened?


Somehow related to the power being "pulled" at the wrong time?


Perhaps this is a backup done by emacs?


Not sure what does it but I'm glad it did ;)


Please compare it to your other hosts. It should be (mostly?)
identical, but make sure that host_id= is unique per host. It should
match the spm host id for this host in the engine database.


I had to restore one of my hosts (host 1) manually due a cleanup during my 
re-deploy attempts. I managed to do this successfully by copying the missing 
files from another host (host 2) but the first time the host ID matched one of 
the other hosts (which made at least hosted-engine --vm-status unhappy) [I 
hadn't seen your email yet :(]. I subsequently corrected the host_id and 
rebooted the guilty host. Things mostly seem to be working now except that in 
hosted-engine --vm-status my first two hosts (the one I copied the .conf from 
as well as the one I copied it to [without changing the ID :O]) now have the 
same hostname :-/ I'm assuming there's a mismatch in the engine database - 
where/how do I fix that?



I didn't check, but am pretty certain that it's not related to the
engine db. Do you see such duplicates there as well (using the web ui
or sql against it)? If so, fix these first. If no other means, put the
host to maintenance and reinstall with the correct name.

If it's just the shared storage, you can try the following. Carefully.
Didn't try myself. Try on a test system first.

1. Set global maintenance

2. Stop ovirt-ha-agent, ovirt-ha-broker, perhaps also vdsmd, supervdsmd

3. hosted-engine --clean_metadata --host-id=1

- Perhaps even pass --force-cleanup, not sure when it's needed

- Repeat for other IDs as needed

4. Start ovirt-ha-agent (I think this should start all the others, but
make sure)

5. Wait a bit. I am pretty certain that they should recreate their
entries in the shared storage and eventually --vm-status should look
ok.

6. Exit global maintenance

Good luck,


Appreciated! (and happy cos our cluster is almost back to normal :) )

On 2021/02/03 11:30, Yedidyah Bar David wrote:

On Wed, Feb 3, 2021 at 11:12 AM Roderick Mooi  wrote:


Hello and thanks for assisting!

I think I may have found the problem :)

/etc/ovirt-hosted-engine/hosted-engine.conf

is blank.

But I do have hosted-engine.conf~


Any idea how this happened?

Perhaps this is a backup done by emacs?



Can I cp this to restore the original?


Please compare it to your other hosts. It should be (mostly?)
identical, but make sure that host_id= is unique per host. It should
match the spm host id for this host in the engine database.



Anything else I need to do?


Not sure, but better find the root cause to make sure no other damage was done.

Good luck,



Appreciated


On 2021/02/02 11:37, Strahil Nikolov wrote:

Usually,

I would start with checking the output of the 
/var/log/ovirt-hosted-engine-ha/{broker,agent}.log

I'm typing it on my phone, so the path could have a typo.

Check if the following services (also typed by memory, might have to remove the 
'd') are running:
- sanlock
- supervdsmd
- vdsmd


Sometimes, some of my VGs (gluster) are not activated, so if you run 
hyperconverged -> you can 'vgchange -ay'.

[ovirt-users] Re: Recovery from power outage

2021-02-03 Thread Roderick Mooi

Thanks,


I didn't check, but am pretty certain that it's not related to the
engine db. Do you see such duplicates there as well (using the web ui
or sql against it)? If so, fix these first. If no other means, put the
host to maintenance and reinstall with the correct name.


Not seeing duplicates in the web UI, only in the --vm-status. Can you please 
assist me with the sql commands or reference to the database schema + where to 
check? I'd like to check that first before doing anything too drastic.

Note: it only duplicated the hostname after I changed the host_id, before that 
it had the correct hostname but duplicate host_id.

PS I have a recent backup of the database (just before which I could restore if 
you think that'll do the trick without breaking anything?


On 2021/02/03 16:33, Yedidyah Bar David wrote:

On Wed, Feb 3, 2021 at 4:21 PM Roderick Mooi  wrote:


Hi,


Any idea how this happened?


Somehow related to the power being "pulled" at the wrong time?


Perhaps this is a backup done by emacs?


Not sure what does it but I'm glad it did ;)


Please compare it to your other hosts. It should be (mostly?)
identical, but make sure that host_id= is unique per host. It should
match the spm host id for this host in the engine database.


I had to restore one of my hosts (host 1) manually due a cleanup during my 
re-deploy attempts. I managed to do this successfully by copying the missing 
files from another host (host 2) but the first time the host ID matched one of 
the other hosts (which made at least hosted-engine --vm-status unhappy) [I 
hadn't seen your email yet :(]. I subsequently corrected the host_id and 
rebooted the guilty host. Things mostly seem to be working now except that in 
hosted-engine --vm-status my first two hosts (the one I copied the .conf from 
as well as the one I copied it to [without changing the ID :O]) now have the 
same hostname :-/ I'm assuming there's a mismatch in the engine database - 
where/how do I fix that?



I didn't check, but am pretty certain that it's not related to the
engine db. Do you see such duplicates there as well (using the web ui
or sql against it)? If so, fix these first. If no other means, put the
host to maintenance and reinstall with the correct name.

If it's just the shared storage, you can try the following. Carefully.
Didn't try myself. Try on a test system first.

1. Set global maintenance

2. Stop ovirt-ha-agent, ovirt-ha-broker, perhaps also vdsmd, supervdsmd

3. hosted-engine --clean_metadata --host-id=1

- Perhaps even pass --force-cleanup, not sure when it's needed

- Repeat for other IDs as needed

4. Start ovirt-ha-agent (I think this should start all the others, but
make sure)

5. Wait a bit. I am pretty certain that they should recreate their
entries in the shared storage and eventually --vm-status should look
ok.

6. Exit global maintenance

Good luck,


Appreciated! (and happy cos our cluster is almost back to normal :) )

On 2021/02/03 11:30, Yedidyah Bar David wrote:

On Wed, Feb 3, 2021 at 11:12 AM Roderick Mooi  wrote:


Hello and thanks for assisting!

I think I may have found the problem :)

/etc/ovirt-hosted-engine/hosted-engine.conf

is blank.

But I do have hosted-engine.conf~


Any idea how this happened?

Perhaps this is a backup done by emacs?



Can I cp this to restore the original?


Please compare it to your other hosts. It should be (mostly?)
identical, but make sure that host_id= is unique per host. It should
match the spm host id for this host in the engine database.



Anything else I need to do?


Not sure, but better find the root cause to make sure no other damage was done.

Good luck,



Appreciated


On 2021/02/02 11:37, Strahil Nikolov wrote:

Usually,

I would start with checking the output of the 
/var/log/ovirt-hosted-engine-ha/{broker,agent}.log

I'm typing it on my phone, so the path could have a typo.

Check if the following services (also typed by memory, might have to remove the 
'd') are running:
- sanlock
- supervdsmd
- vdsmd


Sometimes, some of my VGs (gluster) are not activated, so if you run 
hyperconverged -> you can 'vgchange -ay'.

Best Regards,
Strahil Nikolov


Sent from Yahoo Mail on Android 
<https://go.onelink.me/107872968?pid=InProduct=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers_wl=ym_sub1=Internal_sub2=Global_YGrowth_sub3=EmailSignature>

  On Tue, Feb 2, 2021 at 11:28, Roderick Mooi
   wrote:
  Hi!

  We had a power outage and all our servers (oVirt hosts) went down. When 
they started up neither the hosted-engine nor VMs were started.

  hosted-engine --vm-status
  says:
  You must run deploy first

  I tried running deploy with various options but ultimately get stuck at:

  The Host ID is already known. Is this a re-deployment on an additional 
host that was previously set up (Yes, No)[Yes]?
  ...
  [ ERROR ] Failed to execute stage 'Closing up': 

  OR

  The specified storage location alre

[ovirt-users] Re: Recovery from power outage

2021-02-03 Thread Roderick Mooi

Hi,


Any idea how this happened?


Somehow related to the power being "pulled" at the wrong time?


Perhaps this is a backup done by emacs?


Not sure what does it but I'm glad it did ;)


Please compare it to your other hosts. It should be (mostly?)
identical, but make sure that host_id= is unique per host. It should
match the spm host id for this host in the engine database.


I had to restore one of my hosts (host 1) manually due a cleanup during my 
re-deploy attempts. I managed to do this successfully by copying the missing 
files from another host (host 2) but the first time the host ID matched one of 
the other hosts (which made at least hosted-engine --vm-status unhappy) [I 
hadn't seen your email yet :(]. I subsequently corrected the host_id and 
rebooted the guilty host. Things mostly seem to be working now except that in 
hosted-engine --vm-status my first two hosts (the one I copied the .conf from 
as well as the one I copied it to [without changing the ID :O]) now have the 
same hostname :-/ I'm assuming there's a mismatch in the engine database - 
where/how do I fix that?

Appreciated! (and happy cos our cluster is almost back to normal :) )

On 2021/02/03 11:30, Yedidyah Bar David wrote:

On Wed, Feb 3, 2021 at 11:12 AM Roderick Mooi  wrote:


Hello and thanks for assisting!

I think I may have found the problem :)

/etc/ovirt-hosted-engine/hosted-engine.conf

is blank.

But I do have hosted-engine.conf~


Any idea how this happened?

Perhaps this is a backup done by emacs?



Can I cp this to restore the original?


Please compare it to your other hosts. It should be (mostly?)
identical, but make sure that host_id= is unique per host. It should
match the spm host id for this host in the engine database.



Anything else I need to do?


Not sure, but better find the root cause to make sure no other damage was done.

Good luck,



Appreciated


On 2021/02/02 11:37, Strahil Nikolov wrote:

Usually,

I would start with checking the output of the 
/var/log/ovirt-hosted-engine-ha/{broker,agent}.log

I'm typing it on my phone, so the path could have a typo.

Check if the following services (also typed by memory, might have to remove the 
'd') are running:
- sanlock
- supervdsmd
- vdsmd


Sometimes, some of my VGs (gluster) are not activated, so if you run 
hyperconverged -> you can 'vgchange -ay'.

Best Regards,
Strahil Nikolov


Sent from Yahoo Mail on Android 
<https://go.onelink.me/107872968?pid=InProduct=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers_wl=ym_sub1=Internal_sub2=Global_YGrowth_sub3=EmailSignature>

 On Tue, Feb 2, 2021 at 11:28, Roderick Mooi
  wrote:
 Hi!

 We had a power outage and all our servers (oVirt hosts) went down. When 
they started up neither the hosted-engine nor VMs were started.

 hosted-engine --vm-status
 says:
 You must run deploy first

 I tried running deploy with various options but ultimately get stuck at:

 The Host ID is already known. Is this a re-deployment on an additional 
host that was previously set up (Yes, No)[Yes]?
 ...
 [ ERROR ] Failed to execute stage 'Closing up': 

 OR

 The specified storage location already contains a data domain. Is this an 
additional host setup (Yes, No)[Yes]? No
 [ ERROR ] Re-deploying the engine VM over a previously (partially) 
deployed system is not supported. Please clean up the storage device or select 
a different one and retry.

 NOTES:
 1. This is oVirt v3.6 (legacy install, I know...)
 2. We do have daily engine backups (.bak files) [till the day the power 
failed]

 Any advice/assistance appreciated.

 Thanks!

 Roderick
 ___
 Users mailing list -- users@ovirt.org <mailto:users@ovirt.org>
 To unsubscribe send an email to users-le...@ovirt.org 
<mailto:users-le...@ovirt.org>
 Privacy Statement: https://www.ovirt.org/privacy-policy.html 
<https://www.ovirt.org/privacy-policy.html>
 oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/ 
<https://www.ovirt.org/community/about/community-guidelines/>
 List Archives:
 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/
 
<https://lists.ovirt.org/archives/list/users@ovirt.org/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/>


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HTWNERBX42JNOMONSCG6BL2MCIQZDW7C/





___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ov

[ovirt-users] Re: Recovery from power outage

2021-02-03 Thread Roderick Mooi

Hello and thanks for assisting!

I think I may have found the problem :)

/etc/ovirt-hosted-engine/hosted-engine.conf

is blank.

But I do have hosted-engine.conf~

Can I cp this to restore the original?

Anything else I need to do?

Appreciated


On 2021/02/02 11:37, Strahil Nikolov wrote:

Usually,

I would start with checking the output of the 
/var/log/ovirt-hosted-engine-ha/{broker,agent}.log

I'm typing it on my phone, so the path could have a typo.

Check if the following services (also typed by memory, might have to remove the 
'd') are running:
- sanlock
- supervdsmd
- vdsmd


Sometimes, some of my VGs (gluster) are not activated, so if you run 
hyperconverged -> you can 'vgchange -ay'.

Best Regards,
Strahil Nikolov


Sent from Yahoo Mail on Android 
<https://go.onelink.me/107872968?pid=InProduct=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers_wl=ym_sub1=Internal_sub2=Global_YGrowth_sub3=EmailSignature>

On Tue, Feb 2, 2021 at 11:28, Roderick Mooi
 wrote:
Hi!

We had a power outage and all our servers (oVirt hosts) went down. When 
they started up neither the hosted-engine nor VMs were started.

hosted-engine --vm-status
says:
You must run deploy first

I tried running deploy with various options but ultimately get stuck at:

The Host ID is already known. Is this a re-deployment on an additional host 
that was previously set up (Yes, No)[Yes]?
...
[ ERROR ] Failed to execute stage 'Closing up': 

OR

The specified storage location already contains a data domain. Is this an 
additional host setup (Yes, No)[Yes]? No
[ ERROR ] Re-deploying the engine VM over a previously (partially) deployed 
system is not supported. Please clean up the storage device or select a 
different one and retry.

NOTES:
1. This is oVirt v3.6 (legacy install, I know...)
2. We do have daily engine backups (.bak files) [till the day the power 
failed]

Any advice/assistance appreciated.

Thanks!

Roderick
___
Users mailing list -- users@ovirt.org <mailto:users@ovirt.org>
To unsubscribe send an email to users-le...@ovirt.org 
<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html 
<https://www.ovirt.org/privacy-policy.html>
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/ 
<https://www.ovirt.org/community/about/community-guidelines/>
List Archives:

https://lists.ovirt.org/archives/list/users@ovirt.org/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/
 
<https://lists.ovirt.org/archives/list/users@ovirt.org/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/>


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HTWNERBX42JNOMONSCG6BL2MCIQZDW7C/


[ovirt-users] Recovery from power outage

2021-02-02 Thread Roderick Mooi

Hi!

We had a power outage and all our servers (oVirt hosts) went down. When they 
started up neither the hosted-engine nor VMs were started.

hosted-engine --vm-status
says:
You must run deploy first

I tried running deploy with various options but ultimately get stuck at:

The Host ID is already known. Is this a re-deployment on an additional host 
that was previously set up (Yes, No)[Yes]?
...
[ ERROR ] Failed to execute stage 'Closing up': 

OR

The specified storage location already contains a data domain. Is this an 
additional host setup (Yes, No)[Yes]? No
[ ERROR ] Re-deploying the engine VM over a previously (partially) deployed 
system is not supported. Please clean up the storage device or select a 
different one and retry.

NOTES:
1. This is oVirt v3.6 (legacy install, I know...)
2. We do have daily engine backups (.bak files) [till the day the power failed]

Any advice/assistance appreciated.

Thanks!

Roderick
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/


[ovirt-users] Recovery from power outage

2021-02-02 Thread Roderick Mooi

Hi!

We had a power outage and all our servers (oVirt hosts) went down. When they 
started up neither the hosted-engine nor VMs were started.

hosted-engine --vm-status
says:
You must run deploy first

I tried running deploy with various options but ultimately get stuck at:

The Host ID is already known. Is this a re-deployment on an additional host 
that was previously set up (Yes, No)[Yes]?
...
[ ERROR ] Failed to execute stage 'Closing up': 

OR

The specified storage location already contains a data domain. Is this an 
additional host setup (Yes, No)[Yes]? No
[ ERROR ] Re-deploying the engine VM over a previously (partially) deployed 
system is not supported. Please clean up the storage device or select a 
different one and retry.

NOTES:
1. This is oVirt v3.6 (legacy install, I know...)
2. We do have daily engine backups (.bak files) [till the day the power failed]

Any advice/assistance appreciated.

Thanks!

Roderick
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/SCJF5NVE6GRDCH6L66AY62T3CWSGWXND/


[ovirt-users] Recovery from power outage

2021-02-02 Thread Roderick Mooi

Hi!

We had a power outage and all our servers (oVirt hosts) went down. When they 
started up neither the hosted-engine nor VMs were started.

hosted-engine --vm-status
says:
You must run deploy first

I tried running deploy with various options but ultimately get stuck at:

The Host ID is already known. Is this a re-deployment on an additional host 
that was previously set up (Yes, No)[Yes]?
...
[ ERROR ] Failed to execute stage 'Closing up': 

OR

The specified storage location already contains a data domain. Is this an 
additional host setup (Yes, No)[Yes]? No
[ ERROR ] Re-deploying the engine VM over a previously (partially) deployed 
system is not supported. Please clean up the storage device or select a 
different one and retry.

NOTES:
1. This is oVirt v3.6 (legacy install, I know...)
2. We do have daily engine backups (.bak files) [till the day the power failed]

Any advice/assistance appreciated.

Thanks!

Roderick
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HVDAHSDC63QWFJZ75F7NNN6CQVAQGWEW/


[ovirt-users] Re: Storage errors - member of pool, inaccessible and could not disconnect

2018-09-25 Thread Roderick Mooi

  
  
Hi, I understand we're on an old version and therefore their might
be some reluctance to assist (we are building a new cluster on the
latest Ovirt 4 but I need to maintain this in the meantime and plan
to upgrade once I can host the VMs on the new cluster). So, I found
some related bugs (and fixes) [e.g.
https://bugzilla.redhat.com/show_bug.cgi?id=1317699 and duplicates]
but haven't managed to completely resolve this [e.g. tweaking the
engine DB as per
https://bugzilla.redhat.com/show_bug.cgi?id=1351203]. Even if you
point me to the relevant redhat bugs or a previous list conversation
I'd appreciate it. Minor patch(es) to source files I can handle if
necessary...

Perhaps someone can help me with a copy of
https://access.redhat.com/solutions/2423321 (I'm not a redhat
subscriber so cannot access) and it follows from the bug reports
above...

Appreciated,

Roderick 

On 2018/09/20 2:55 PM, Roderick Mooi
  wrote:


  
  Anyone? Help please!
  
  On 2018/09/11 2:31 PM, Roderick Mooi
wrote:
  
  

Greetings!

I'm running a 3 node ovirt (3.6) hosted engine
(3.6.5.3-1.el7.centos) cluster with glusterfs (3.7.11) storage.
I keep getting this error for my hosted engine storage:

ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed
to stop monitoring domain
(sd_uuid=ff8ce693-5a52-47df-8e06-3443b4dc98a4): Error 900 from
stopMonitoringDomain: Storage domain is member of pool:
'domain=ff8ce693-5a52-47df-8e06-3443b4dc98a4'
ovirt_hosted_engine_ha.lib.image.Image:Teardown images
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Disconnecting
the storage
ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Disconnecting
storage server

seemingly related to these ERROR messages in vdsm.log:
Storage.TaskManager.Task::(_setError) Task=`xyz`::Unexpected
error
> Storage domain is member of pool
or
> Domain is either partially accessible or entirely
inaccessible
or
Storage.HSM::(disconnectStorageServer) Could not disconnect from
storageServer

and then it updates config, mounts again, rinse, repeat, every
+-minute! (and seems to introduce side effects like engine state
changes, inability to migrate engine VM, hosted engine HA status
changing, etc.)

Extracts from logs attached. The main issues seem to stem from
the ERROR lines 214, 778, 1037, etc. in the vdsm.log...

Everything else seems to be working fine.

Please advise?

Thanks,

Roderick
  
  


  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4PVEAMK5AEHUEEZGC6V3ZVHVF5TBHRJ7/


[ovirt-users] Re: 3.6 - how to extend hosted engine disk size?

2018-09-20 Thread Roderick Mooi

  
  
Bump :-/

On 2018/09/10 2:10 PM, Roderick Mooi
  wrote:


  Greetings,

I've thought of trying the following:
1. Put cluster in global maintenance
2. Increase (extend) HE disk size in web UI
3. Reboot engine VM
(4. Increase disk/volume size in VM if needed)
5. Change maintenance to "none"

Will it work?

Thanks


  

On 2018/09/06 12:43 PM, Roderick Mooi wrote:


  Hi!

Running oVirt 3.6. I would like to increase (extend) the hosted engine
VM's disk size. What is the correct way?

Thanks,

Roderick


  


  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HY6VFBOXXNJD3BKGHTYZOXCOW2FYPO5D/


[ovirt-users] Re: Storage errors - member of pool, inaccessible and could not disconnect

2018-09-20 Thread Roderick Mooi

  
  
Anyone? Help please!

On 2018/09/11 2:31 PM, Roderick Mooi
  wrote:


  
  Greetings!
  
  I'm running a 3 node ovirt (3.6) hosted engine
  (3.6.5.3-1.el7.centos) cluster with glusterfs (3.7.11) storage. I
  keep getting this error for my hosted engine storage:
  
  ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed to
  stop monitoring domain
  (sd_uuid=ff8ce693-5a52-47df-8e06-3443b4dc98a4): Error 900 from
  stopMonitoringDomain: Storage domain is member of pool:
  'domain=ff8ce693-5a52-47df-8e06-3443b4dc98a4'
  ovirt_hosted_engine_ha.lib.image.Image:Teardown images
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Disconnecting
  the storage
ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Disconnecting
  storage server
  
  seemingly related to these ERROR messages in vdsm.log:
  Storage.TaskManager.Task::(_setError) Task=`xyz`::Unexpected error
  > Storage domain is member of pool
  or
  > Domain is either partially accessible or entirely
  inaccessible
  or
  Storage.HSM::(disconnectStorageServer) Could not disconnect from
  storageServer
  
  and then it updates config, mounts again, rinse, repeat, every
  +-minute! (and seems to introduce side effects like engine state
  changes, inability to migrate engine VM, hosted engine HA status
  changing, etc.)
  
  Extracts from logs attached. The main issues seem to stem from the
  ERROR lines 214, 778, 1037, etc. in the vdsm.log...
  
  Everything else seems to be working fine.
  
  Please advise?
  
  Thanks,
  
  Roderick


  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TUEMHACCECV2C7RJN3YRA6CF4CBUGEDF/


[ovirt-users] 3.6 - how to extend hosted engine disk size?

2018-09-10 Thread Roderick Mooi
Greetings,

I've thought of trying the following:
1. Put cluster in global maintenance
2. Increase (extend) HE disk size in web UI
3. Reboot engine VM
(4. Increase disk/volume size in VM if needed)
5. Change maintenance to "none"

Will it work?

Thanks

> 
> On 2018/09/06 12:43 PM, Roderick Mooi wrote:
>> Hi!
>>
>> Running oVirt 3.6. I would like to increase (extend) the hosted engine
>> VM's disk size. What is the correct way?
>>
>> Thanks,
>>
>> Roderick
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MPCQYHGHHE437DNONL3XYK222G36WLB2/


[ovirt-users] Re: Extend hosted engine disk size

2018-09-10 Thread Roderick Mooi
ping. please advise? thanks!

On 2018/09/06 12:43 PM, Roderick Mooi wrote:
> Hi!
> 
> Running oVirt 3.6. I would like to increase (extend) the hosted engine
> VM's disk size. What is the correct way?
> 
> Thanks,
> 
> Roderick
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5UWEYHFDWIRZGYJVWP5BVDSQEWFWDZK4/


[ovirt-users] Re: Extend hosted engine disk size

2018-09-06 Thread Roderick Mooi
Previous msg got wrapped in html - apologies.

On 2018/09/06 12:43 PM, Roderick Mooi wrote:
> Hi!
> 
> Running oVirt 3.6. I would like to increase (extend) the hosted engine
> VM's disk size. What is the correct way?
> 
> Thanks,
> 
> Roderick
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DXRB627YS6PQFCP45BI3EOTOY62H4FZU/


[ovirt-users] Extend hosted engine disk size

2018-09-06 Thread Roderick Mooi

  
  
Hi!

Running oVirt 3.6. I would like to increase (extend) the hosted
engine VM's disk size. What is the correct way?

Thanks,

Roderick
  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WQMVURIOHKDLAMSGHFKTPH7SLZFBPTP/


Re: [ovirt-users] Could not associate gluster brick with correct network warning

2016-05-30 Thread Roderick Mooi
Hi

Yes, I created the volume using "gluster volume create ..." prior to
installing ovirt. Something I noticed is that there is no "gluster" bridge
on top of the interface I selected for the "Gluster Management" network -
could this be the problem?

Thanks,

Roderick

Roderick Mooi

Senior Engineer: South African National Research Network (SANReN)
Meraka Institute, CSIR

roder...@sanren.ac.za | +27 12 841 4111 | www.sanren.ac.za

On Fri, May 27, 2016 at 11:35 AM, Ramesh Nachimuthu <rnach...@redhat.com>
wrote:

> How did you create the volume?. Looks like the volume was created using
> FQDN in Gluster CLI.
>
>
> Regards,
> Ramesh
>
> - Original Message -
> > From: "Roderick Mooi" <roder...@sanren.ac.za>
> > To: "users" <users@ovirt.org>
> > Sent: Friday, May 27, 2016 2:34:51 PM
> > Subject: [ovirt-users] Could not associate gluster brick with correct
> network warning
> >
> > Good day
> >
> > I've setup a "Gluster Management" network in DC, cluster and all hosts.
> It is
> > appearing as "operational" in the cluster and all host networks look
> > correct. But I'm seeing this warning continually in the engine.log:
> >
> > 2016-05-27 08:56:58,988 WARN
> >
> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc]
> > (DefaultQuartzScheduler_Worker-80) [] Could not associate brick
> > 'glustermount.host1:/gluster/data/brick' of volume
> > '7a25d2fb-1048-48d8-a26d-f288ff0e28cb' with correct network as no gluster
> > network found in cluster '0002-0002-0002-0002-02b8'
> >
> > This is on ovirt 3.6.5.
> >
> > Can anyone assist?
> >
> > Thanks,
> >
> > Roderick Mooi
> >
> > Senior Engineer: South African National Research Network (SANReN)
> > Meraka Institute, CSIR
> >
> > roder...@sanren.ac.za | +27 12 841 4111 | www.sanren.ac.za
> >
> > ___
> > Users mailing list
> > Users@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> >
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Could not associate gluster brick with correct network warning

2016-05-27 Thread Roderick Mooi
Good day

I've setup a "Gluster Management" network in DC, cluster and all hosts. It
is appearing as "operational" in the cluster and all host networks look
correct. But I'm seeing this warning continually in the engine.log:

2016-05-27 08:56:58,988 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc]
(DefaultQuartzScheduler_Worker-80) [] Could not associate brick
'glustermount.host1:/gluster/data/brick' of volume
'7a25d2fb-1048-48d8-a26d-f288ff0e28cb' with correct network as no gluster
network found in cluster '0002-0002-0002-0002-02b8'

This is on ovirt 3.6.5.

Can anyone assist?

Thanks,

Roderick Mooi

Senior Engineer: South African National Research Network (SANReN)
Meraka Institute, CSIR

roder...@sanren.ac.za | +27 12 841 4111 | www.sanren.ac.za
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] How to change host networks?

2016-05-25 Thread Roderick Mooi
Good day

Can anyone assist me with this please? - I'm starting over and would like
to know how to change network settings correctly post-installation. Must I
only change it in the GUI and then sync to have the changes applied to the
hosts? Does this work for the management network?

Thanks,

Roderick

Roderick Mooi

Senior Engineer: South African National Research Network (SANReN)
Meraka Institute, CSIR

roder...@sanren.ac.za | +27 12 841 4111 | www.sanren.ac.za

On Thu, May 19, 2016 at 2:16 PM, Roderick Mooi <roder...@sanren.ac.za>
wrote:

> Hi!
>
> Both. My immediate problem is in email 17 May "Cannot sync networks"
> (existing setup) with no answer as yet. But I need to do this again even if
> I start over (see detail below - I tried but it isn't working so if there's
> a better way I'd like to know)...
>
> So essentially, the network I use for setup isn't the same as the final
> production network for various reasons - one of them being the installers
> requirements re reachability and connectivity; the other that the test lab
> has a different subnet and I need to get it working there first; third
> requirement is isolation of the management network once setup is complete.
>
> I've almost managed to achieve this by putting the engine into global
> maintenance, shut down engine, manually changing the network config on the
> hosts (incl vdsm) and engine, rebooting all hosts (which eventually brings
> the engine back up), re-configuring logical networks and syncing the
> networks - this all works except the sync - it goes on for a few minutes,
> then reports lost connectivity to the host and I presume resets the network
> config - i.e. the sync doesn't succeed. Besides the hosts showing network
> not synced all functionality except migration seems to be working (all
> hosts visible and reachable and I can start a VM on any of them)
>
> Thanks very much,
>
> Roderick
>
> Roderick Mooi
>
> Senior Engineer: South African National Research Network (SANReN)
> Meraka Institute, CSIR
>
> roder...@sanren.ac.za | +27 12 841 4111 | www.sanren.ac.za
>
> On Wed, May 18, 2016 at 3:25 PM, Shmuel Melamud <smela...@redhat.com>
> wrote:
>
>> Hi!
>>
>> ​Do you have any existing setup that you want to migrate or you're only
>> planning?
>>
>> ​Shmuel​
>>
>> On Wed, May 18, 2016 at 2:04 PM, Roderick Mooi <roder...@sanren.ac.za>
>> wrote:
>>
>>> Good day
>>>
>>> I would like to change the IP addresses of the host networks used by
>>> Ovirt including the ovirtmgmt bridge. I need to move them to a different
>>> subnet post installation. What is the best/safest way to do this?
>>>
>>> Thanks,
>>>
>>> Roderick Mooi
>>>
>>> Senior Engineer: South African National Research Network (SANReN)
>>> Meraka Institute, CSIR
>>>
>>> roder...@sanren.ac.za | +27 12 841 4111 | www.sanren.ac.za
>>>
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Cannot sync networks

2016-05-25 Thread Roderick Mooi
Hi Alona

Thanks but this is what I already tried between steps 5 and 6 - when I
click “ok” it tries to sync and eventually seems to give up and roll back.
Will it help if I uncheck “verify engine to host connectivity”? (I’m
surprised that the verification fails as technically the network I’m trying
to set is the one that is currently being used by the host and engine - all
I want to do is update the DC to match….)

Regards,

Roderick

Roderick Mooi

Senior Engineer: South African National Research Network (SANReN)
Meraka Institute, CSIR

roder...@sanren.ac.za | +27 12 841 4111 | www.sanren.ac.za

On Sun, May 22, 2016 at 2:37 PM, Alona Kaplan <alkap...@redhat.com> wrote:

>
>
> - Original Message -----
> > From: "Roderick Mooi" <roder...@sanren.ac.za>
> > To: users@ovirt.org
> > Sent: Tuesday, May 17, 2016 12:59:03 PM
> > Subject: [ovirt-users] Cannot sync networks
> >
> > Good day
> >
> > On oVirt 3.6.4 HE. I manually reconfigured host networks by doing the
> > following:
> > 1. Changed to global maintenance mode
> > 2. Shutdown the hosted engine
> > 3. on each host:
> > a. systemctl stop ovirt-ha-agent && systemctl stop ovirt-ha-broker &&
> > systemctl stop vdsmd
> > b. updated /var/lib/vdsm/persistence/netconf/nets to match the required
> > config
> > c. updated ifcfg files to match
> > d. rebooted each host
> > 4. When all is up and running again, verified network connectivity and
> > settings on each host - ok.
> > 5. Logged into engine web UI - all hosts detected but show network
> > out-of-sync (see attached example).
> > 6. Individual sync / sync all networks runs for a while till login time
> out.
> > Log back in and go to host - still shows out-of-sync (even next day or
> after
> > rebooting again).
> > 7. Eventually got one host to sync (not hosting any VMs and rebooted).
> > 8. Whatever I try cannot get the other hosts to sync (even when moving
> all
> > VMs off and rebooting).
> >
> > Any ideas? Do I have to manually edit the database and change the network
> > settings for the DC - if so, how do I do this? (the host config is what I
> > want - DC is old config.)
>
> Syncing a network means applying the DC config on the host (overriding the
> actual host config).
> Since as you mentioned, the DC config is old and not relevant,
> I guess it is not working since you're trying to apply wrong
> address/gateway/netmask on the management network, it causes the engine
> lose connectivity to the host, and eventually ends up with a rollback.
>
> What you need to do is to update the DC config to be same as the host
> config.
> Since the out-of-sync properties in the attached image are address,
> gateway and netmask of 'ovirtmgmt',
> you need to open the 'setup networks' dialog and edit the 'ovirtmgmt'
> network (clicking on the pencil icon).
> In order to be able to edit the network, you first have to mark the 'sync
> network' checkbox in the dialog. Then modify the address/gateway/netmask to
> the host value.
> Then click 'ok' and perform the setup networks ('ok' to the setup networks
> main dialog).
>
> >
> > Alternatively, if I have to start all over again, what is the correct
> way for
> > changing networks post HE install (I need this as the network for
> > installation and testing looks different to the final production
> network)?
> >
> > Thanks very much,
> >
> > Roderick
> >
> >
> >
> > ___
> > Users mailing list
> > Users@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> >
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] How to change host networks?

2016-05-19 Thread Roderick Mooi
Hi!

Both. My immediate problem is in email 17 May "Cannot sync networks"
(existing setup) with no answer as yet. But I need to do this again even if
I start over (see detail below - I tried but it isn't working so if there's
a better way I'd like to know)...

So essentially, the network I use for setup isn't the same as the final
production network for various reasons - one of them being the installers
requirements re reachability and connectivity; the other that the test lab
has a different subnet and I need to get it working there first; third
requirement is isolation of the management network once setup is complete.

I've almost managed to achieve this by putting the engine into global
maintenance, shut down engine, manually changing the network config on the
hosts (incl vdsm) and engine, rebooting all hosts (which eventually brings
the engine back up), re-configuring logical networks and syncing the
networks - this all works except the sync - it goes on for a few minutes,
then reports lost connectivity to the host and I presume resets the network
config - i.e. the sync doesn't succeed. Besides the hosts showing network
not synced all functionality except migration seems to be working (all
hosts visible and reachable and I can start a VM on any of them)

Thanks very much,

Roderick

Roderick Mooi

Senior Engineer: South African National Research Network (SANReN)
Meraka Institute, CSIR

roder...@sanren.ac.za | +27 12 841 4111 | www.sanren.ac.za

On Wed, May 18, 2016 at 3:25 PM, Shmuel Melamud <smela...@redhat.com> wrote:

> Hi!
>
> ​Do you have any existing setup that you want to migrate or you're only
> planning?
>
> ​Shmuel​
>
> On Wed, May 18, 2016 at 2:04 PM, Roderick Mooi <roder...@sanren.ac.za>
> wrote:
>
>> Good day
>>
>> I would like to change the IP addresses of the host networks used by
>> Ovirt including the ovirtmgmt bridge. I need to move them to a different
>> subnet post installation. What is the best/safest way to do this?
>>
>> Thanks,
>>
>> Roderick Mooi
>>
>> Senior Engineer: South African National Research Network (SANReN)
>> Meraka Institute, CSIR
>>
>> roder...@sanren.ac.za | +27 12 841 4111 | www.sanren.ac.za
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] How to change host networks?

2016-05-18 Thread Roderick Mooi
Good day

I would like to change the IP addresses of the host networks used by Ovirt
including the ovirtmgmt bridge. I need to move them to a different subnet
post installation. What is the best/safest way to do this?

Thanks,

Roderick Mooi

Senior Engineer: South African National Research Network (SANReN)
Meraka Institute, CSIR

roder...@sanren.ac.za | +27 12 841 4111 | www.sanren.ac.za
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Cannot sync networks

2016-05-17 Thread Roderick Mooi
Good day

On oVirt 3.6.4 HE. I manually reconfigured host networks by doing the
following:
1. Changed to global maintenance mode
2. Shutdown the hosted engine
3. on each host:
a. systemctl stop ovirt-ha-agent && systemctl stop ovirt-ha-broker &&
systemctl stop vdsmd
b. updated /var/lib/vdsm/persistence/netconf/nets to match the required
config
c. updated ifcfg files to match
d. rebooted each host
4. When all is up and running again, verified network connectivity and
settings on each host - ok.
5. Logged into engine web UI - all hosts detected but show network
out-of-sync (see attached example).
6. Individual sync / sync all networks runs for a while till login time
out. Log back in and go to host - still shows out-of-sync (even next day or
after rebooting again).
7. Eventually got one host to sync (not hosting any VMs and rebooted).
8. Whatever I try cannot get the other hosts to sync (even when moving all
VMs off and rebooting).

Any ideas? Do I have to manually edit the database and change the network
settings for the DC - if so, how do I do this? (the host config is what I
want - DC is old config.)

Alternatively, if I have to start all over again, what is the correct way
for changing networks post HE install (I need this as the network for
installation and testing looks different to the final production network)?

Thanks very much,

Roderick
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt glusterfs performance

2016-04-06 Thread Roderick Mooi
Hi Ravi and colleagues

(apologies for hijacking this thread but I’m not sure where else to report this 
(and it is related).)

With gluster 3.7.10, running
#gluster volume set  group virt
fails with:
volume set: failed: option : eager-lock does not exist
Did you mean eager-lock?

I had to remove the eager-lock setting from /var/lib/glusterd/groups/virt to 
get this to work. It seems like setting eager-lock has been removed from latest 
gluster. Is this correct? Either way, is there anything else I should do?

Cheers,

Roderick

> On 12 Feb 2016, at 6:18 AM, Ravishankar N  wrote:
> 
> Hi Bill,
> Can you enable virt-profile setting for your volume and see if that helps? 
> You need to enable this optimization when you create the volume using ovrit, 
> or use the following command for an existing volume:
> 
> #gluster volume set  group virt
> 
> -Ravi
> 
> 
> On 02/12/2016 05:22 AM, Bill James wrote:
>> My apologies, I'm showing how much of a noob I am.
>> Ignore last direct to gluster numbers, as that wasn't really glusterfs.
>> 
>> 
>> [root@ovirt2 test ~]# mount -t glusterfs ovirt2-ks.test.j2noc.com:/gv1 
>> /mnt/tmp/
>> [root@ovirt2 test ~]# time dd if=/dev/zero of=/mnt/tmp/testfile2 bs=1M 
>> count=1000 oflag=direct
>> 1048576000 bytes (1.0 GB) copied, 65.8596 s, 15.9 MB/s
>> 
>> That's more how I expected, it is pointing to glusterfs performance.
>> 
>> 
>> 
>> On 02/11/2016 03:27 PM, Bill James wrote:
>>> don't know if it helps, but I ran a few more tests, all from the same 
>>> hardware node.
>>> 
>>> The VM:
>>> [root@billjov1 ~]# time dd if=/dev/zero of=/root/testfile bs=1M count=1000 
>>> oflag=direct
>>> 1048576000 bytes (1.0 GB) copied, 62.5535 s, 16.8 MB/s
>>> 
>>> Writing directly to gluster volume:
>>> [root@ovirt2 test ~]# time dd if=/dev/zero 
>>> of=/gluster-store/brick1/gv1/testfile bs=1M count=1000 oflag=direct
>>> 1048576000 bytes (1.0 GB) copied, 9.92048 s, 106 MB/s
>>> 
>>> 
>>> Writing to NFS volume:
>>> [root@ovirt2 test ~]# time dd if=/dev/zero of=/mnt/storage/qa/testfile 
>>> bs=1M count=1000 oflag=direct
>>> 1048576000 bytes (1.0 GB) copied, 10.5776 s, 99.1 MB/s
>>> 
>>> NFS & Gluster are using the same interface. Tests were not run at same time.
>>> 
>>> This would suggest my problem isn't glusterfs, but the VM performance.
>>> 
>>> 
>>> 
>>> On 02/11/2016 03:13 PM, Bill James wrote:
 xml attached. 
 
 
 On 02/11/2016 12:28 PM, Nir Soffer wrote: 
> On Thu, Feb 11, 2016 at 8:27 PM, Bill James  
>  wrote: 
>> thank you for the reply. 
>> 
>> We setup gluster using the names associated with  NIC 2 IP. 
>>   Brick1: ovirt1-ks.test.j2noc.com:/gluster-store/brick1/gv1 
>>   Brick2: ovirt2-ks.test.j2noc.com:/gluster-store/brick1/gv1 
>>   Brick3: ovirt3-ks.test.j2noc.com:/gluster-store/brick1/gv1 
>> 
>> That's NIC 2's IP. 
>> Using 'iftop -i eno2 -L 5 -t' : 
>> 
>> dd if=/dev/zero of=/root/testfile bs=1M count=1000 oflag=direct 
>> 1048576000 bytes (1.0 GB) copied, 68.0714 s, 15.4 MB/s 
> Can you share the xml of this vm? You can find it in vdsm log, 
> at the time you start the vm. 
> 
> Or you can do (on the host): 
> 
> # virsh 
> virsh # list 
> (username: vdsm@ovirt password: shibboleth) 
> virsh # dumpxml vm-id 
> 
>> Peak rate (sent/received/total):  281Mb 5.36Mb 
>> 282Mb 
>> Cumulative (sent/received/total):1.96GB 14.6MB 
>> 1.97GB 
>> 
>> gluster volume info gv1: 
>>   Options Reconfigured: 
>>   performance.write-behind-window-size: 4MB 
>>   performance.readdir-ahead: on 
>>   performance.cache-size: 1GB 
>>   performance.write-behind: off 
>> 
>> performance.write-behind: off didn't help. 
>> Neither did any other changes I've tried. 
>> 
>> 
>> There is no VM traffic on this VM right now except my test. 
>> 
>> 
>> 
>> On 02/10/2016 11:55 PM, Nir Soffer wrote: 
>>> On Thu, Feb 11, 2016 at 2:42 AM, Ravishankar N  
>>>  
>>> wrote: 
 +gluster-users 
 
 Does disabling 'performance.write-behind' give a better throughput? 
 
 
 
 On 02/10/2016 11:06 PM, Bill James wrote: 
> I'm setting up a ovirt cluster using glusterfs and noticing not 
> stellar 
> performance. 
> Maybe my setup could use some adjustments? 
> 
> 3 hardware nodes running centos7.2, glusterfs 3.7.6.1, ovirt 
> 3.6.2.6-1. 
> Each node has 8 spindles configured in 1 array which is split using 
> LVM 
> with one logical volume for system and one for gluster. 
> They each have 4 NICs, 
>NIC1 = ovirtmgmt 
>NIC2 = gluster  (1GbE) 
>>> How do you ensure that gluster trafic is using