Discussion on-going in new github issue: https://github.com/openshift/openshift-ansible/issues/5088
On Tue, Aug 15, 2017 at 8:44 AM, Tim Bielawa <[email protected]> wrote: > Tim, > > Can you please provide more information? You full inventory would be very > useful right now for debugging. Feel free to mask your hostnames if you > wish. What I need to see to debug this further are all the parameters > you're setting in the [OSEv3] section and applying to each host in > [masters] and [nodes]. > > You will find my GPG public key fingerprint in my signature If you wish to > encrypt the inventory file instead > > As for those two stalls you mentioned: > > "Ensure OpenShift <THING> correctly rolls out (best-effort today)" > > > The delays you experienced are normal and expected. Those delays are > typically because the pod images were being downloaded to your hosts. > However, you showed your 'oc get nodes' output and I noticed your master > said "Ready,SchedulingDisabled". Because your master is labeled as > 'SchedulingDisabled' then your master should *NOT* be running any pods. In > which case that means it wasn't downloading pod images. > > Can you please provide the following information: > > * The output from `oc get all` on your master > * The output `docker images` on your node *AND* your master > * Your complete inventory file. As I said before, feel free to mask your > hostnames or IPs if you prefer. > > You logs would also be helpful. Ensure you run ansible-playbook with the > -vv option for extra verbosity. You can do this in two ways: > > 1) If you run the install again you can set: > > log_path = /tmp/ansible.log > > > in the [defaults] section of your ansible.cfg file. > > 2) Alternatively you can capture the output of ansible using the `tee` > command like so: > > ansible-playbook -vv -i <INVENTORY> ./playbooks/byo/config.yml | tee >> /tmp/ansible.log > > > Again, if you wish to keep this information private, my GPG key is in my > signature. Short ID is 0333AE37. > > > Thanks! > > > > On Tue, Aug 15, 2017 at 6:15 AM, Tim Dudgeon <[email protected]> > wrote: > >> Thanks for the response, and sorry for the delay on my end - I've been >> away for a week. >> >> I ran through the process again and got the same result. On the node it >> looks like the openshift services are running OK: >> >> systemctl list-units --all | grep -i origin >> origin-node.service loaded active running OpenShift Node >> >> But from the master the node has not joined the cluster: >> >> oc get nodes >> NAME STATUS AGE VERSION >> 2c0e37ab-f41e-40f1-a466-a575c85823b6.priv.cloud.scaleway.com >> Ready,SchedulingDisabled 26m v1.6.1+5115d708d7 >> >> The install process seems to have gone OK. There were no obvious errors, >> though it did twice stall at a point like this: >> >> ### TASK [openshift_hosted : Ensure OpenShift router correctly rolls out >> (best-effort today)] ****************** >> >> But after waiting for about 5-10 mins it continued. >> >> There were lot of 'skipping' messages during the install, but no obvious >> errors. The output was huge and not captured to a file, so I'd have to run >> it again to try to get a full log. >> >> Any thoughts as to what is wrong? >> >> Tim >> >> On 04/08/2017 16:07, Tim Bielawa wrote: >> >> (reposting: forgot to reply-all the first time) >> >> >> Just based off of the number of tasks your summary says completed I am >> not sure your installation actually completed in full. I expect to see >> upwards of 1->2 thousand tasks. >> >> >> A while back we changed node integration behavior such that if a node >> fails to provision it does not stop your entire installation. This is to >> ease the pain felt when provisioning large (hundred+) node clusters. >> >> <private node1 dns name> : ok=235 changed=56 unreachable=0 failed=0 >> >> >> That node did not fully install. Open a shell on that node and check the >> openshift services. I'm willing to bet that >> >> systemctl list-units --all | grep -i origin >> >> >> would show the node service is not running. Find the name of the node >> service and then examine the journal logs for that node >> >> journalctl -x -u <node-service-name> >> >> >> >> I think we (the openshift-ansible team) will want to add detection of >> failed node integrations into our error summary report in the future. Would >> you mind please opening an issue for this on our github page with this >> information? >> >> >> Thanks! >> >> >> >> On Sun, Jul 30, 2017 at 10:57 AM, Tim Dudgeon <[email protected]> >> wrote: >> >>> I'm trying to get to grips with the advanced (Ansible) installer. >>> Initially I'm trying to do something very simple, fire up a cluster with >>> one master and one node. >>> My inventory file looks like this: >>> >>> [OSEv3:children] >>> masters >>> nodes >>> >>> >>> [OSEv3:vars] >>> ansible_ssh_user=root >>> openshift_hostname=<private master dns name> >>> openshift_master_cluster_hostname=<private master dns name> >>> openshift_master_cluster_public_hostname=<public master dns name> >>> openshift_disable_check=docker_storage,memory_availability >>> openshift_deployment_type=origin >>> >>> [masters] >>> <private master dns name> >>> >>> [etcd] >>> <private master dns name> >>> >>> >>> [nodes] >>> <private master dns name> >>> <private node1 dns name> >>> >>> >>> I run: >>> ansible-playbook ~/openshift-ansible/playbooks/byo/config.yml >>> and (after a long time) it completes, without any noticeable errors: >>> >>> ... >>> PLAY RECAP ************************************************************ >>> ************************************************************ >>> ********************************* >>> <private node1 dns name> : ok=235 changed=56 unreachable=0 failed=0 >>> <private master dns name> : ok=623 changed=166 unreachable=0 failed=0 >>> localhost : ok=12 changed=0 unreachable=0 failed=0 >>> >>> Both nodes seem to have been setup OK. >>> But when I look on the master node there is only the master in the >>> cluster, no second node: >>> >>> oc get nodes >>> NAME STATUS AGE >>> <private master dns name> Ready,SchedulingDisabled 32m >>> >>> and of course like this nothing can get scheduled. >>> >>> Presumably the node should be added to the cluster, so any ideas what is >>> going wrong here? >>> >>> Thanks >>> Tim >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>> >> >> >> >> -- >> Tim Bielawa, Sr. Software Engineer [ED-C137] >> IRC: tbielawa (#openshift) >> 1BA0 4FAB 4C13 FBA0 A036 4958 AD05 E75E 0333 AE37 >> >> >> > > > -- > Tim Bielawa, Sr. Software Engineer [ED-C137] > IRC: tbielawa (#openshift) > 1BA0 4FAB 4C13 FBA0 A036 4958 AD05 E75E 0333 AE37 > -- Tim Bielawa, Sr. Software Engineer [ED-C137] Cell: 919.332.6411 | IRC: tbielawa (#openshift) 1BA0 4FAB 4C13 FBA0 A036 4958 AD05 E75E 0333 AE37
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
