On Tue, Oct 25, 2022 at 6:27 AM Matthew J Black <matt...@peregrineit.net> wrote: > > OK, so, with all the tooing-and-frowing things stand as follows (@03:15UTC > 25-Oct-2022): > > - I managed to solve the "DNF Timeout" issue (see my post "Local (Deployment) > VM Can't Reach "centos-ceph-pacific" Repo") and so simplified the deployment > command to `hosted-engine --deploy`. Unfortunately this still results in a > "Host is not up" error, with the logs as per before. > > - As mentioned elsewhere in this thread I uploaded the (previous) logs to > Dropbox along with a couple of other relevant(?) files: > https://www.dropbox.com/sh/eymwdy8hzn3sa7z/AACscSP2eaFfoiN-QzyeEVfaa?dl=0 > > - I followed the suggestion of ajude.pereira (see post in this thread) but > this did not resolve the issue. > > - As per one of my other posts in this thread, digging into the logs further > revealed this issue: "Failed to authenticate session > with host 'ovirt_node_1.mynet.local': SSH authentication to > 'root(a)ovirt_node_1.mynet.local' failed. Please verify provided credentials. > Make sure key is authorized at host" > > - I also did a `hosted-engine --deploy > --ansible-extra-vars=he_pause_host=true` (as per the suggestion of Konstantin > - see post in this thread) and tried to work out why ssh wasn't working. I > ssh'd into the deployment VM and then attempted to ssh back into the > deployment host (ie `ssh root@ovirt_node_1.mynet.local`). While I could > connect, I was asked for the root's password.
Good. > I was under the impression that this was supposed to be a "password-less" > operation. It should. At this point, the operation that is attempted and which is failing with the error you see in engine.log ("Failed to authenticate session"), is done using Java code, using the Java library apache-sshd, not the command line ssh. Some of the relevant code is here: https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/hostdeploy/AddVdsCommand.java I do not know this code well, sorry, nor the specifics of apache-sshd vs openssh (and there are such "specifics", as can easily be seen by looking at the engine git log). > As I do not provide the root@ovirt_node_1.mynet.local password anywhere in > the deployment script, I suspect that this is why I'm getting the "Host is > not up" error. > > - To reiterate: the host'd sshd_config file is configured as per the oVirt > documentation. > > So am I wrong in my understanding of the password-less ssh-nature of the > situation and how the deployment script is supposed to work? I think this should work more or less like this: After running engine-setup, and when the engine is already up, we fetch the public key of the engine from it, and store it in your authorized_keys file. This is done here: https://github.com/oVirt/ovirt-ansible-collection/blob/master/roles/hosted_engine_setup/tasks/bootstrap_local_vm/05_add_host.yml#L36 - name: Set Engine public key as authorized key without validating the TLS/SSL certificates I do see this in your log in dropbox. Do you see /root/.ssh/authorized_keys on the host (with a timestamp similar to the log line)? If so, you can try this, from the engine VM: ssh -v -i /etc/pki/ovirt-engine/keys/engine_id_rsa ovirt_node_1.mynet.local If this does not work, you can continue debugging this until you manage to understand/fix. Perhaps check sshd config etc. If it does work, it means the issue might be due to incompatibility between apache-sshd and openssh and/or the configuration. > > Also, does *anyone* have any pointers, suggestions, or can otherwise help me > out - thanks. At this point, you should be able to log into the admin UI (the pause message provides a link) and try to manually add the host. It seems like this didn't work for you. This is because "host_result_up_check" is "failed", and we pause only if it succeeded and the host is returned with status "non_operational". Feel free to create an issue to make the code pause also if "host_result_up_check" is "failed" - not sure why we do not, perhaps we did have a reason. Anyway, you can force the code to pause after trying to add the host but before checking if this worked, by passing "--ansible-extra-vars=he_pause_host=true". You can also check/share more of engine.log - there might be more information prior to the failure (but as I said, I do not know this code well). You can try running sshd (the server) with debug info and check its own log - the issue might be due to incompatible keys on one or both of the sides, or something like that. Sorry that I do not remember if you wrote this before - is this your first attempt to install oVirt? If so, perhaps try first to start with a clean host, without any custom configuration (e.g. of sshd), and see if this works for you. If you do have access to a successful setup, you can more easily compare. Good luck and best regards, -- Didi _______________________________________________ Users mailing list -- email@example.com To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://firstname.lastname@example.org/message/TL2A2GGQU3WRF26LAVGZ2QZRI7IZUCBV/