Can the Origin Ansible Playbook stop on "Restart node" fatal errors?

Alan Christie Wed, 14 Mar 2018 03:47:29 -0700

Hi,

I’ve been running the Ansible release-3.7 branch playbook and occasionally I 
get errors restarting nodes. I’m not looking for help on why my nodes are not 
restarting but I am curious as to why the playbook continues when there are 
fatal errors that eventually lead to a failure some 30 minutes or so later? 
Especially annoying if you happen a) not to be looking at the screen at the 
time of the original failure or b) running the installation inside another IaC 
framework.


Is there an option to “stop on fatal” I’m missing by chance?

Here’s a typical failure at (in my case) 21 minutes in…

RUNNING HANDLER [openshift_node : restart node] 
******************************************************************
Wednesday 14 March 2018  10:12:44 +0000 (0:00:00.081)       0:21:47.968 ******* 
skipping: [os-master-1]
skipping: [os-node-001]
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (1 retries left).
FAILED - RETRYING: restart node (1 retries left).
fatal: [os-infra-1]: FAILED! => {"attempts": 3, "changed": false, "msg": 
"Unable to restart service origin-node: Job for origin-node.service failed 
because the control process exited with error code. See \"systemctl status 
origin-node.service\" and \"journalctl -xe\" for details.\n"}
fatal: [os-node-002]: FAILED! => {"attempts": 3, "changed": false, "msg": 
"Unable to restart service origin-node: Job for origin-node.service failed 
because the control process exited with error code. See \"systemctl status 
origin-node.service\" and \"journalctl -xe\" for details.\n"}

And the roll-out finally "gives up the ghost" (in my case) after a further 30 
minutes...

TASK [debug] 
*****************************************************************************************************
Wednesday 14 March 2018  10:42:20 +0000 (0:00:00.117)       0:51:23.829 ******* 
skipping: [os-master-1]
        to retry, use: --limit 
@/home/centos/abc/orchestrator/openshift/openshift-ansible/playbooks/byo/config.retry

PLAY RECAP 
*******************************************************************************************************
localhost                  : ok=13   changed=0    unreachable=0    failed=0   
os-infra-1                 : ok=182  changed=70   unreachable=0    failed=1   
os-master-1                : ok=539  changed=210  unreachable=0    failed=0   
os-node-001                : ok=188  changed=65   unreachable=0    failed=0   
os-node-002                : ok=165  changed=61   unreachable=0    failed=1

Alan Christie

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Can the Origin Ansible Playbook stop on "Restart node" **fatal** errors?

Reply via email to

Can the Origin Ansible Playbook stop on "Restart node" fatal errors?