Hi,
I’ve been running the Ansible release-3.7 branch playbook and occasionally I
get errors restarting nodes. I’m not looking for help on why my nodes are not
restarting but I am curious as to why the playbook continues when there are
fatal errors that eventually lead to a failure some 30 minutes or so later?
Especially annoying if you happen a) not to be looking at the screen at the
time of the original failure or b) running the installation inside another IaC
framework.
Is there an option to “stop on fatal” I’m missing by chance?
Here’s a typical failure at (in my case) 21 minutes in…
RUNNING HANDLER [openshift_node : restart node]
******************************************************************
Wednesday 14 March 2018 10:12:44 +0000 (0:00:00.081) 0:21:47.968 *******
skipping: [os-master-1]
skipping: [os-node-001]
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (1 retries left).
FAILED - RETRYING: restart node (1 retries left).
fatal: [os-infra-1]: FAILED! => {"attempts": 3, "changed": false, "msg":
"Unable to restart service origin-node: Job for origin-node.service failed
because the control process exited with error code. See \"systemctl status
origin-node.service\" and \"journalctl -xe\" for details.\n"}
fatal: [os-node-002]: FAILED! => {"attempts": 3, "changed": false, "msg":
"Unable to restart service origin-node: Job for origin-node.service failed
because the control process exited with error code. See \"systemctl status
origin-node.service\" and \"journalctl -xe\" for details.\n"}
And the roll-out finally "gives up the ghost" (in my case) after a further 30
minutes...
TASK [debug]
*****************************************************************************************************
Wednesday 14 March 2018 10:42:20 +0000 (0:00:00.117) 0:51:23.829 *******
skipping: [os-master-1]
to retry, use: --limit
@/home/centos/abc/orchestrator/openshift/openshift-ansible/playbooks/byo/config.retry
PLAY RECAP
*******************************************************************************************************
localhost : ok=13 changed=0 unreachable=0 failed=0
os-infra-1 : ok=182 changed=70 unreachable=0 failed=1
os-master-1 : ok=539 changed=210 unreachable=0 failed=0
os-node-001 : ok=188 changed=65 unreachable=0 failed=0
os-node-002 : ok=165 changed=61 unreachable=0 failed=1
Alan Christie
_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users