[ovirt-users] Re: Weird problem starting VMs in oVirt-4.4

2020-06-17 Thread Oliver Leinfelder

Krutika Dhananjay wrote:
Yes, so the bug has been fixed upstream and the backports to release-7 
and release-8 of gluster pending merge. The fix should be available in 
the next .x release of gluster-7 and 8. Until then like Nir suggested, 
please turn off performance.stat-prefetch on your volumes.



It looks like I ran exactly into this bug when I wrote this:

https://lists.ovirt.org/archives/list/users@ovirt.org/message/3BQMCIGCEOLIOV3LSW47GVPKSMOOK7IL/

During my tests, the deployment went through when trying for the third 
time - only to discover of course that the problem persists and it, sure 
enough came back to haunt me when I rebooted the hosted engine.


I'm not entirely sure I fully understand the problem. What I did, of 
course, was this:

# gluster volume set engine performance.stat-prefetch off

It doesn't help with my currently deployed HE - it gets stuck at the 
graphical BIOS screen which I can interact with using "hosted-engine 
--console" but the best outcome there is to "Reset" which turns the 
whole VM off.


Assuming something got lost with the stat-prefetch setting turned on 
before: Is there any way to fix this? Will a redeployment surely fix it?


Bonus question: I'm using oVirt Node for the VM and Gluster hosts. Will 
a fix be coming by way of package updates for this in the foreseeable 
future?


Thank you
Oliver
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EFDKZOTY6I4QBGTXB275J2XCXOMOTBI6/


[ovirt-users] Re: oVirt 4.4.0 HE deployment on GlusterFS fails during health check

2020-06-13 Thread Oliver Leinfelder
Hi,

Your gluster mount option is not correct.
> You need 'backup-volfile-servers=storagehost2:storagehost3' (without the
> volume name as they all have thaylt volume) .


yes, of course. I'm sorry but the appended volume name was a mistake I made
for the email and not during deployment where only specified the FQDNs
without the volname.

As mentioned, the mount generally seema to work as data ist written during
deployment. It fails later during health check :-(

Best regards
Oliver
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/SP7ZNIIXGHQ73452J4D6FYWQT2DQIKHR/


[ovirt-users] oVirt 4.4.0 HE deployment on GlusterFS fails during health check

2020-06-13 Thread Oliver Leinfelder

Hi,

I have the following two components:

1.) A freshly installed VM host (oVirt Node 4.4.0 release ISO)
2.) 3 storage hosts, also freshly installed from oVirt Node 4.4.0 
release ISO


The storage hosts have been successfully installed with Gluster (through 
Cockpit). They have two volumes, both of which I can mount and 
read/write from a client.


On the VM host, I ran "hosted-engine --deploy" (no backups imported).

When prompted for storage, I answered "glusterfs" and specified 
"storagehost1:/engine" as storage for the HE deployment. For mount 
options, I specified 
"backup-volfile-servers=storagehost2:/engine:storagehost3:/engine"


(Not the real hostnames, but all of them are resolvable via internal DNS)

Everything seems to works fine, I also see the "engine" volume become 
populated with data. At some point I could ping and SSH login to the HE.


When the setup proceed to health check, it failed and the whole process 
was aborted :-(


"hosted-engine --vm-status" reported "failed liveliness check" when it 
was reachable via SSH. At some point the engine went down and, to my 
surprise, shows a grub prompt after the restart when doing a 
"hosted-engine --console".


[ INFO  ] TASK [ovirt.hosted_engine_setup : Check engine VM health]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 180, "changed": 
true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": 
"0:00:00.160595", "end": "2020-06-12 17:50:05.675774", "rc": 0, "start": 
"2020-06-12 17:50:05.515179", "stderr": "", "stderr_lines": [],
"stdout": "{\"1\": {\"host-id\": 1, \"host-ts\": 11528, \"score\": 3400, 
\"engine-status\": {\"vm\": \"up\", \"health\": \"bad\", \"detail\": 
\"Powering down\", \"reason\": \"failed liveliness check\"}, 
\"hostname\": \"vmhost\", \"maintenance\": false, \"stopped\": false, 
\"crc32\": \"2c447835\", \"conf_on_shared_storage\": true, 
\"local_conf_timestamp\": 11528, \"extra\": 
\"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=11528 
(Fri Jun 12 17:49:57 
2020)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=11528 (Fri Jun 12 
17:49:57 
2020)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\ntimeout=Thu 
Jan  1 04:12:48 1970\\n\", \"live-data\": true}, \"global_maintenance\": 
false}", "stdout_lines": ["{\"1\": {\"host-id\": 1, \"host-ts\": 11528, 
\"score\": 3400, \"engine-status\": {\"vm\": \"up\", \"health\": 
\"bad\", \"detail\": \"Powering down\", \"reason\": \"failed liveliness 
check\"}, \"hostname\": \"vmhost\", \"maintenance\": false, \"stopped\": 
false, \"crc32\": \"2c447835\", \"conf_on_shared_storage\": true, 
\"local_conf_timestamp\": 11528, \"extra\": 
\"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=11528 
(Fri Jun 12 17:49:57 
2020)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=11528 (Fri Jun 12 
17:49:57 
2020)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\ntimeout=Thu 
Jan  1 04:12:48 1970\\n\", \"live-data\": true}, \"global_maintenance\": 
false}"]}


A second attempt failed at exactly the same stage.

I can see the following in the setup log:

ovirt-hosted-engine-setup-20200612151212-j9zwd2.log:
2020-06-12 17:33:18,314+0200 DEBUG 
otopi.ovirt_hosted_engine_setup.ansible_utils 
ansible_utils._process_output:103 {'msg': 'non-zero return code', 'cmd': 
['hosted-engine', '--reinitialize-lockspace', '--force'], 'stdout': '', 
'stderr': 'Traceback (most recent call last):\n
  File "/usr/lib64/python3.6/runpy.py", line 193, in 
_run_module_as_main\n    "__main__", mod_spec)\n  File 
"/usr/lib64/python3.6/runpy.py", line 85, in _run_code\n exec(code, 
run_globals)\n  File 
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/reinitialize_
lockspace.py", line 30, in \n ha_cli.reset_lockspace(force)\n  
File 
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/client/client.py", 
line 286, in reset_lockspace\n    stats = 
broker.get_stats_from_storage()\n  File "/usr/lib/python3.6/site-packages/ov
irt_hosted_engine_ha/lib/brokerlink.py", line 148, in 
get_stats_from_storage\n    result = self._proxy.get_stats()\n  File 
"/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__\n return 
self.__send(self.__name, args)\n  File "/usr/lib64/python3.6/xmlrpc/client
.py", line 1452, in __request\n    verbose=self.__verbose\n  File 
"/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request\n return 
self.single_request(host, handler, request_body, verbose)\n File 
"/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in single_requ
est\n    http_conn = self.send_request(host, handler, request_body, 
verbose)\n  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in 
send_request\n    self.send_content(connection, request_body)\n File 
"/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_con
tent\n    connection.endheaders(request_body)\n  File 
"/usr/lib64/python3.6/http/client.py", line 1249, in endheaders\n 
self._send_output(message_body, 

[ovirt-users] Re: oVirt 4.4: Self-hosted engine deployment fails with backup restore from 4.3 engine

2020-05-27 Thread Oliver Leinfelder

Hi,


I think I know (it's hard to tell without more logs, but anyway):

It's because your PKI was expired and thus renewed. If you used the
command line to restore/deploy, you were also asked:

 'Renew engine CA on restore if needed? Please notice '
 'that if you choose Yes, all hosts will have to be '
 'later manually reinstalled from the engine. '
 '(@VALUES@)[@DEFAULT@]: '

and probably replied Yes.

You have two options:

1. Try again, and reply No.
2. Run first engine-setup (can add --offline to prevent it from
upgrading) on your old engine. You should be prompted there, and reply
Yes, and then take a backup after it finishes and try again to restore
with that backup.

In any case, that's a b


your guess was right, I think (btw: I check the ca.pem in the old HE - 
this one is valid til 2028). I took the easy way and replied with "No".


I will open a bug (my first one :-)). Is this one for the 
ovirt-hosted-engine-setup category?


Anyway, setup ran much farther but still did not complete. It fails 
after this error now:


[ ERROR ] ovirtsdk4.AuthError: Error during SSO authentication 
server_error : PKIX path building failed: 
sun.security.provider.certpath.SunCertPathBuilderException: unable to 
find valid certification path to requested target
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": 
false, "msg": "Error during SSO authentication server_error : PKIX path 
building failed: 
sun.security.provider.certpath.SunCertPathBuilderException: unable to 
find valid certification path to requested target"}


I can see this in /var/log/engine.log in the HE.

2020-05-27 16:10:43,695+02 ERROR 
[org.ovirt.engine.core.sso.utils.SsoUtils] (default task-8) [] 
OAuthException server_error: PKIX path building failed: 
sun.security.provider.certpath.SunCertPathBuilderException: unable to 
find valid certification path to requested target
2020-05-27 16:10:53,962+02 INFO 
[org.ovirt.engine.extension.aaa.jdbc.core.Authentication] (default 
task-8) [] locking user: admin due to interval failures
2020-05-27 16:10:58,956+02 ERROR 
[org.ovirt.engine.core.sso.utils.SsoUtils] (default task-8) [] 
OAuthException server_error: PKIX path building failed: 
sun.security.provider.certpath.SunCertPathBuilderException: unable to 
find valid certification path to requested target
2020-05-27 16:11:09,222+02 INFO 
[org.ovirt.engine.extension.aaa.jdbc.core.Authentication] (default 
task-8) [] locking user: admin due to interval failures
2020-05-27 16:11:14,217+02 ERROR 
[org.ovirt.engine.core.sso.utils.SsoUtils] (default task-8) [] 
OAuthException server_error: PKIX path building failed: 
sun.security.provider.certpath.SunCertPathBuilderException: unable to 
find valid certification path to requested target
2020-05-27 16:11:24,484+02 INFO 
[org.ovirt.engine.extension.aaa.jdbc.core.Authentication] (default 
task-8) [] locking user: admin due to interval failures
2020-05-27 16:11:29,480+02 ERROR 
[org.ovirt.engine.core.sso.utils.SsoUtils] (default task-8) [] 
OAuthException server_error: PKIX path building failed: 
sun.security.provider.certpath.SunCertPathBuilderException: unable to 
find valid certification path to requested target


I tried to dig around a bit more in /var/log of the HE to get more 
details but can't seem to find anything there.


Thanks in advance!

Best regards
Oli

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PXPI2Z65JFXR7VCWNVSIPOAIEC4EZX3I/


[ovirt-users] Re: oVirt 4.4: Self-hosted engine deployment fails with backup restore from 4.3 engine

2020-05-27 Thread Oliver Leinfelder

Hi,

Yedidyah Bar David wrote:


In any case (perhaps not relevant to you right now, if indeed engine-setup
succeeded), usually the engine vm is left running at the end of a failed
deploy. If it's still the local vm, you can find its IP address by searching
the ansible logs for local_vm_ip, then you can ssh to it from the host.

For fixing the "empty engine-logs dirs", now pushed this:

https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/pull/325

Didn't test yet, it's just a guess.


I followed up on your remark that the engine may indeed be running. And 
it is, sorry for not seeing this earlier.


Anyway, I was thus able to take a look in /var/log/ovirt-engine/setup in 
the HE VM and found the following error (I found a couple of more 
"suspicious" lines, but this one sticks out).


2020-05-27 00:17:09,660+0200 DEBUG otopi.context 
context._executeMethod:145 method exception

Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/M2Crypto/BIO.py", line 279, 
in openfile

    f = open(filename, mode)
FileNotFoundError: [Errno 2] No such file or directory: 
'/etc/pki/ovirt-engine/qemu-ca.pem'


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, 
in _executeMethod

    method['method']()
  File 
"/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/pki/ca.py", 
line 699, in _miscUpgrade

    if self._expired(self._x509_load_cert(ca_file)):
  File 
"/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/pki/ca.py", 
line 94, in _x509_load_cert

    res = X509.load_cert(f)
  File "/usr/lib64/python3.6/site-packages/M2Crypto/X509.py", line 802, 
in load_cert

    with BIO.openfile(file) as bio:
  File "/usr/lib64/python3.6/site-packages/M2Crypto/BIO.py", line 281, 
in openfile

    raise BIOError(ex.args)
M2Crypto.BIO.BIOError: (2, 'No such file or directory')
2020-05-27 00:17:09,663+0200 ERROR otopi.context 
context._executeMethod:154 Failed to execute stage 'Misc configuration': 
(2, 'No such file or directory')


Best regards
Oli
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IU2N3PECOV6VFGBWKXMHXDSEKCTVNZTB/


[ovirt-users] Re: oVirt 4.4: Self-hosted engine deployment fails with backup restore from 4.3 engine

2020-05-27 Thread Oliver Leinfelder

Hi there,

You should also see one or more ERROR messages, can you check/post them? 


There is one error message that immediately follows, if that helps:

2020-05-27 00:17:12,397+0200 ERROR otopi.context 
context._executeMethod:154 Failed to execute stage 'Closing up': Failed 
executing ansible-playbook
2020-05-27 00:17:12,397+0200 DEBUG otopi.context 
context.dumpEnvironment:765 ENVIRONMENT DUMP - BEGIN
2020-05-27 00:17:12,398+0200 DEBUG otopi.context 
context.dumpEnvironment:775 ENV BASE/error=bool:'True'
2020-05-27 00:17:12,398+0200 DEBUG otopi.context 
context.dumpEnvironment:775 ENV BASE/exceptionInfo=list:'[('RuntimeError'>, RuntimeError('Failed executing ansible-playbook',), 
)]'
2020-05-27 00:17:12,398+0200 DEBUG otopi.context 
context.dumpEnvironment:779 ENVIRONMENT DUMP - END
2020-05-27 00:17:12,398+0200 INFO otopi.context context.runSequence:616 
Stage: Clean up
2020-05-27 00:17:12,399+0200 DEBUG otopi.context context.runSequence:620 
STAGE cleanup


Other than that, there is nothing that looks like an error (or contains 
the word "error").



Also, if possible, please try to check/share the engine-setup log. If
you can access the engine VM, it's there, in:

/var/log/ovirt-engine/setup


The engine is not running after that, "hosted-engine --vm-status" gives 
me the following error:


It seems like a previous attempt to deploy hosted-engine failed or it's 
still in progress. Please clean it up before trying again

Otherwise, you might find it in the host doing the deployment, in:

/var/log/ovirt-hosted-engine-setup/engine-logs*
I have 4 directories like this (from my failed deployment attempts ;-)), 
but all of them are empty.


The last attempt was with a new backup, just in case.

The production oVirt is 4.3.9, the host I'm installing from is a clean 
install from the ovirt node 4.4.0 release ISO with the last available 
package upgrades.


Best regards
Oliver
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3KXFKTXLHR3DQH6JHEGBIBURCLYGF3GB/


[ovirt-users] oVirt 4.4: Self-hosted engine deployment fails with backup restore from 4.3 engine

2020-05-27 Thread Oliver Leinfelder
Hi there,

I'm a bit puzzled about an possible upgrade paths from a 4.3 cluster to
version 4.4 in a self-hosted engine environment.

My idea was:

Set up a new host with a clean ovirt node 4.4 installation, then deploy the
hosted engine on this with a restored backup from the production cluster
and go from there.

This however fails with the following error:

2020-05-27 00:17:08,886+0200 DEBUG
otopi.ovirt_hosted_engine_setup.ansible_utils
ansible_utils._process_output:103 {'msg': 'non-zero return code', 'cmd':
['engine-setup', '--accept-defaults',
'--config-append=/root/ovirt-engine-answers'], 'stdout': "[ INFO  ] Stage:
Initializing\n[ INFO  ] Stage: Environment setup\n  C
onfiguration files: /etc/ovirt-engine-setup.conf.d/10-packaging-jboss.conf,
/etc/ovirt-engine-setup.conf.d/10-packaging.conf,
/etc/ovirt-engine-setup.conf.d/20-setup-ovirt-post.conf,
/root/ovirt-engine-answers\n  Log file:
/var/log/ovirt-engine/setup/ovirt-engine-setup-20200527001657-fyeueu.log\n
 Version: otop
i-1.9.1 (otopi-1.9.1-1.el8)\n[ INFO  ] DNF Downloading 1 files, 0.00KB\n[
INFO  ] DNF Downloaded CentOS-8 - AppStream\n[ INFO  ] DNF Downloading 1
files, 0.00KB\n[ INFO  ] DNF Downloaded CentOS-8 - Base\n[ INFO  ] DNF
Downloading 1 files, 0.00KB\n
[...]
... anwsers from backup config follow 
[...]

2020-05-27 00:17:12,396+0200 DEBUG otopi.context context._executeMethod:145
method exception
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in
_executeMethod
method['method']()
  File
"/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/misc.py",
line 403, in _closeup
r = ah.run()
  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/ansible_utils.py",
line 229, in run
raise RuntimeError(_('Failed executing ansible-playbook'))

Is this approach (restoring from 4.3) generally supposed to work? If not,
what is the appropriate upgrade path?

Thank you!

Regards
Oli
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CY6UZZKQEQJVHA73W3ODHDY3D3VI3WHE/