Hi,

I have the following two components:

1.) A freshly installed VM host (oVirt Node 4.4.0 release ISO)
2.) 3 storage hosts, also freshly installed from oVirt Node 4.4.0 release ISO

The storage hosts have been successfully installed with Gluster (through Cockpit). They have two volumes, both of which I can mount and read/write from a client.

On the VM host, I ran "hosted-engine --deploy" (no backups imported).

When prompted for storage, I answered "glusterfs" and specified "storagehost1:/engine" as storage for the HE deployment. For mount options, I specified "backup-volfile-servers=storagehost2:/engine:storagehost3:/engine"

(Not the real hostnames, but all of them are resolvable via internal DNS)

Everything seems to works fine, I also see the "engine" volume become populated with data. At some point I could ping and SSH login to the HE.

When the setup proceed to health check, it failed and the whole process was aborted :-(

"hosted-engine --vm-status" reported "failed liveliness check" when it was reachable via SSH. At some point the engine went down and, to my surprise, shows a grub prompt after the restart when doing a "hosted-engine --console".

[ INFO  ] TASK [ovirt.hosted_engine_setup : Check engine VM health]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 180, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.160595", "end": "2020-06-12 17:50:05.675774", "rc": 0, "start": "2020-06-12 17:50:05.515179", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"host-id\": 1, \"host-ts\": 11528, \"score\": 3400, \"engine-status\": {\"vm\": \"up\", \"health\": \"bad\", \"detail\": \"Powering down\", \"reason\": \"failed liveliness check\"}, \"hostname\": \"vmhost\", \"maintenance\": false, \"stopped\": false, \"crc32\": \"2c447835\", \"conf_on_shared_storage\": true, \"local_conf_timestamp\": 11528, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=11528 (Fri Jun 12 17:49:57 2020)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=11528 (Fri Jun 12 17:49:57 2020)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\ntimeout=Thu Jan  1 04:12:48 1970\\n\", \"live-data\": true}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"host-id\": 1, \"host-ts\": 11528, \"score\": 3400, \"engine-status\": {\"vm\": \"up\", \"health\": \"bad\", \"detail\": \"Powering down\", \"reason\": \"failed liveliness check\"}, \"hostname\": \"vmhost\", \"maintenance\": false, \"stopped\": false, \"crc32\": \"2c447835\", \"conf_on_shared_storage\": true, \"local_conf_timestamp\": 11528, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=11528 (Fri Jun 12 17:49:57 2020)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=11528 (Fri Jun 12 17:49:57 2020)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\ntimeout=Thu Jan  1 04:12:48 1970\\n\", \"live-data\": true}, \"global_maintenance\": false}"]}

A second attempt failed at exactly the same stage.

I can see the following in the setup log:

ovirt-hosted-engine-setup-20200612151212-j9zwd2.log:
2020-06-12 17:33:18,314+0200 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {'msg': 'non-zero return code', 'cmd': ['hosted-engine', '--reinitialize-lockspace', '--force'], 'stdout': '', 'stderr': 'Traceback (most recent call last):\n   File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main\n    "__main__", mod_spec)\n  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code\n exec(code, run_globals)\n  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/reinitialize_ lockspace.py", line 30, in <module>\n ha_cli.reset_lockspace(force)\n  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 286, in reset_lockspace\n    stats = broker.get_stats_from_storage()\n  File "/usr/lib/python3.6/site-packages/ov irt_hosted_engine_ha/lib/brokerlink.py", line 148, in get_stats_from_storage\n    result = self._proxy.get_stats()\n  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__\n return self.__send(self.__name, args)\n  File "/usr/lib64/python3.6/xmlrpc/client .py", line 1452, in __request\n    verbose=self.__verbose\n  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request\n return self.single_request(host, handler, request_body, verbose)\n File "/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in single_requ est\n    http_conn = self.send_request(host, handler, request_body, verbose)\n  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request\n    self.send_content(connection, request_body)\n File "/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_con tent\n    connection.endheaders(request_body)\n  File "/usr/lib64/python3.6/http/client.py", line 1249, in endheaders\n self._send_output(message_body, encode_chunked=encode_chunked)\n File "/usr/lib64/python3.6/http/client.py", line 1036, in _send_output\n    self.sen d(msg)\n  File "/usr/lib64/python3.6/http/client.py", line 974, in send\n    self.connect()\n  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 74, in connect\n self.sock.connect(base64.b16decode(self.host))\nFileNotFoundError: [Errno 2] No such file or directory', 'rc': 1, 'start': '2020-06-12 17:33:17.982995', 'end': '2020-06-12 17:33:18.164197', 'delta': '0:00:00.181202', 'changed': True, 'invocation': {'module_args': {'_raw_params': 'hosted-engine --reinitialize-lockspace --force', 'warn': True, '_u ses_shell': False, 'stdin_add_newline': True, 'strip_empty_ends': True, 'argv': None, 'chdir': None, 'executable': None, 'creates': None, 'removes': None, 'stdin': None}}, 'stdout_lines': [], 'stderr_lines': ['Traceback (most recent call last):', '  File "/usr/lib64/python 3.6/runpy.py", line 193, in _run_module_as_main', '    "__main__", mod_spec)', '  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code', '    exec(code, run_globals)', '  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/reinitialize_lockspace.py", li ne 30, in <module>', '    ha_cli.reset_lockspace(force)', ' File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 286, in reset_lockspace', '    stats = broker.get_stats_from_storage()', '  File "/usr/lib/python3.6/site-packages/ovirt_hoste d_engine_ha/lib/brokerlink.py", line 148, in get_stats_from_storage', '    result = self._proxy.get_stats()', ' File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__', '    return self.__send(self.__name, args)', '  File "/usr/lib64/python3.6/xmlrpc/client. py", line 1452, in __request', '    verbose=self.__verbose', '  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request', '    return self.single_request(host, handler, request_body, verbose)', '  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in sing le_request', '    http_conn = self.send_request(host, handler, request_body, verbose)', '  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request', '    self.send_content(connection, request_body)', ' File "/usr/lib64/python3.6/xmlrpc/client.py", line 13 09, in send_content', '    connection.endheaders(request_body)', ' File "/usr/lib64/python3.6/http/client.py", line 1249, in endheaders', '    self._send_output(message_body, encode_chunked=encode_chunked)', '  File "/usr/lib64/python3.6/http/client.py", line 1036, in _se nd_output', '    self.send(msg)', '  File "/usr/lib64/python3.6/http/client.py", line 974, in send', ' self.connect()', '  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 74, in connect', '    self.sock.connect(base64.b16decode(self.h ost))', 'FileNotFoundError: [Errno 2] No such file or directory'], '_ansible_no_log': False, 'attempts': 5}


Bonus question: I can see the following in the gluster volume log on the storage host: [2020-06-12 17:30:38.368773] E [MSGID: 113072] [posix-inode-fd-ops.c:1976:posix_writev] 0-engine-posix: write failed: offset 0, [Invalid argument] [2020-06-12 17:30:38.368832] E [MSGID: 115067] [server-rpc-fops_v2.c:1372:server4_writev_cbk] 0-engine-server: 190586: WRITEV 4 (baf5fbaa-0dc0-4ee4-bfd9-fa98f8853827), client: CTX_ID:be1aca7a-5ba7-453d-94fd-d24f8c33511e-GRAPH_ID:0-PID:121217-HOST:vmhost-PC_NAME:engine-client-0-RECON_NO:-0, error-xlator: engine-posix [Invalid argument]

There are some entries every few minutes like this (not like 100s per minute). Since gluster is completely new for me, I'm still trying to find out whether this might be related.

Thanks for any input!

Best regards
Oliver
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3BQMCIGCEOLIOV3LSW47GVPKSMOOK7IL/

Reply via email to