This is still troubling me. I would welcome any input on this.

When I run an ansible install (using Origin 3.7.1 on Centos7 nodes) the DNS setup on some nodes seems to randomly get messed up. For instance I've just run a setup with 1 master, 1 infra and 2 identical worker nodes.

During the installation one of the worker nodes starts responding very slowly. The other is fine.
Looking deeper, on the slow responding one I see a DNS setup like this:

[centos@xxx-node-001 ~]$ sudo netstat -tunlp | grep tcp | grep :53 | grep -v tcp6 tcp        0      0 10.0.0.20:53 0.0.0.0:*               LISTEN      14727/dnsmasq tcp        0      0 172.17.0.1:53 0.0.0.0:*               LISTEN      14727/dnsmasq
[centos@xxx-node-001 ~]$ host orndev-bastion-002
;; connection timed out; trying next origin
orndev-bastion-002.openstacklocal has address 10.0.0.9

Whilst on the good one it looks like this:

[centos@xxx-node-002 ~]$ sudo netstat -tunlp | grep tcp | grep :53 | grep -v tcp6 tcp        0      0 127.0.0.1:53 0.0.0.0:*               LISTEN      17231/openshift tcp        0      0 10.129.0.1:53 0.0.0.0:*               LISTEN      14563/dnsmasq tcp        0      0 10.0.0.22:53 0.0.0.0:*               LISTEN      14563/dnsmasq tcp        0      0 172.17.0.1:53 0.0.0.0:*               LISTEN      14563/dnsmasq
[centos@xxx-node-002 ~]$ host orndev-bastion-002
orndev-bastion-002.openstacklocal has address 10.0.0.9
Notice how 2 DNS listeners are not present, and how this causes the DNS lookup to timeout locally before falling back to an upstream server.

Getting into this state seems to be a random event.

Any thoughts?



On 01/03/18 14:30, Tim Dudgeon wrote:

Yes, I think it is related to DNS.

On a similar, but working, OpenStack environment ` netstat -tunlp | grep ...` shows this:

tcp        0      0 127.0.0.1:53 0.0.0.0:*               LISTEN      16957/openshift tcp        0      0 10.128.0.1:53 0.0.0.0:*               LISTEN      16248/dnsmasq tcp        0      0 10.0.0.5:53 0.0.0.0:*               LISTEN      16248/dnsmasq tcp        0      0 172.17.0.1:53 0.0.0.0:*               LISTEN      16248/dnsmasq tcp        0      0 0.0.0.0:8053 0.0.0.0:*               LISTEN      12270/openshift

On the environment where the TSB is failing to start I'm seeing:

tcp        0      0 127.0.0.1:53 0.0.0.0:*               LISTEN      19067/openshift tcp        0      0 10.129.0.1:53 0.0.0.0:*               LISTEN      16062/dnsmasq tcp        0      0 172.17.0.1:53 0.0.0.0:*               LISTEN      16062/dnsmasq tcp        0      0 0.0.0.0:8053 0.0.0.0:*               LISTEN      11628/openshift

Notice that inf the first case dnsmasq is listening on the machine's IP address (line 3) but in the second case  this is missing.

Both environments have been created with the openshift-ansible playbooks using an approach that is as equivalent as is possible. The contents of /etc/dnsmasq.d/ on the two systems also seem to be equivalent.

Any thoughts?



On 28/02/18 18:50, Nobuhiro Sue wrote:
Tim,

It seems to be DNS issue. I guess your environment is on OpenStack, so please check resolver (lookup / reverse lookup).
You can see how DNS works on OpenShift 3.6 or above:
https://blog.openshift.com/dns-changes-red-hat-openshift-container-platform-3-6/

2018-03-01 0:06 GMT+09:00 Tim Dudgeon <tdudgeon...@gmail.com <mailto:tdudgeon...@gmail.com>>:

    Hi

    I'm having problems getting an Origin cluster running, using the
    ansible playbooks.
    It fails at this point:

    TASK [template_service_broker : Verify that TSB is running]
    
**********************************************************************************************************************************
    FAILED - RETRYING: Verify that TSB is running (120 retries left).
    FAILED - RETRYING: Verify that TSB is running (119 retries left).
    <snip>
    FAILED - RETRYING: Verify that TSB is running (1 retries left).
    fatal: [master-01.novalocal]: FAILED! => {"attempts": 120,
    "changed": false, "cmd": ["curl", "-k",
    "https://apiserver.openshift-template-service-broker.svc/healthz
    <https://apiserver.openshift-template-service-broker.svc/healthz>"],
    "delta": "0:00:01.529402", "end": "2018-02-28 14:49:30.190842",
    "msg": "non-zero return code", "rc": 7, "start": "2018-02-28
    14:49:28.661440", "stderr": "  % Total    % Received % Xferd
    Average Speed   Time Time     Time Current\n Dload  Upload  
    Total Spent    Left  Speed\n\r  0     0 0     0    0     0 0     
    0 --:--:-- --:--:-- --:--:-- 0\r  0     0    0     0 0     0     
    0      0 --:--:-- 0:00:01 --:--:--     0curl: (7) Failed connect
    to apiserver.openshift-template-service-broker.svc:443; No route
    to host", "stderr_lines": ["  % Total    % Received % Xferd 
    Average Speed   Time    Time     Time Current", "     Dload
    Upload   Total   Spent Left  Speed", "", "  0     0 0     0   
    0     0      0      0 --:--:-- --:--:-- --:--:--     0", "  0    
    0    0     0    0 0      0 0 --:--:--  0:00:01 --:--:--    
    0curl: (7) Failed connect to
    apiserver.openshift-template-service-broker.svc:443; No route to
    host"], "stdout": "", "stdout_lines": []}

    All I can find in the logs on the master that seems relevant is:

    Feb 28 14:43:25 master-01.novalocal
    origin-master-controllers[9396]: E0228 14:43:25.394326    9396
    daemoncontroller.go:255]
    openshift-template-service-broker/apiserver failed with : error
    storing status for daemon set
    &v1beta1.DaemonSet{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""},
    ObjectMeta:v1.ObjectMeta{Name:"apiserver", GenerateName:"",
    Namespace:"openshift-template-service-broker",
    
SelfLink:"/apis/extensions/v1beta1/namespaces/openshift-template-service-broker/daemonsets/apiserver",
    UID:"baa14f98-1c95-11e8-8a02-fa163e3f98d8",
    ResourceVersion:"2972", Generation:1,
    CreationTimestamp:v1.Time{Time:time.Time{sec:63655425804, nsec:0,
    loc:(*time.Location)(0x111a3dc0)}},
    DeletionTimestamp:(*v1.Time)(nil),
    DeletionGracePeriodSeconds:(*int64)(nil),
    Labels:map[string]string{"apiserver":"true"},
    
Annotations:map[string]string{"kubectl.kubernetes.io/last-applied-configuration
    
<http://kubectl.kubernetes.io/last-applied-configuration>":"{\"apiVersion\":\"extensions/v1beta1\",\"kind\":\"DaemonSet\",\"metadata\":{\"annotations\":{},\"labels\":{\"apiserver\":\"true\"},\"name\":\"apiserver\",\"namespace\":\"openshift-template-service-broker\"},\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"apiserver\":\"true\"},\"name\":\"apiserver\"},\"spec\":{\"containers\":[{\"command\":[\"/usr/bin/openshift\",\"start\",\"template-service-broker\",\"--secure-port=8443\",\"--audit-log-path=-\",\"--tls-cert-file=/var/serving-cert/tls.crt\",\"--tls-private-key-file=/var/serving-cert/tls.key\",\"--loglevel=0\",\"--config=/var/apiserver-config/apiserver-config.yaml\"],\"image\":\"docker.io/openshift/origin:latest\
    
<http://docker.io/openshift/origin:latest%5C>",\"imagePullPolicy\":\"IfNotPresent\",\"name\":\"c\",\"ports\":[{\"containerPort\":8443}],\"readinessProbe\":{\"httpGet\":{\"path\":\"/healthz\",\"port\":8443,\"scheme\":\"HTTPS\"}},\"volumeMounts\":[{\"mountPath\":\"/var/serving-cert\",\"name\":\"serving-cert\"},{\"mountPath\":\"/var/apiserver-config\",\"name\":\"apiserver-config\"}]}],\"nodeSelector\":{\"region\":\"infra\"},\"serviceAccountName\":\"apiserver\",\"volumes\":[{\"name\":\"serving-cert\",\"secret\":{\"defaultMode\":420,\"secretName\":\"apiserver-serving-cert\"}},{\"configMap\":{\"defaultMode\

    Any ideas what might be going wrong?



    _______________________________________________
    users mailing list
    users@lists.openshift.redhat.com
    <mailto:users@lists.openshift.redhat.com>
    http://lists.openshift.redhat.com/openshiftmm/listinfo/users
    <http://lists.openshift.redhat.com/openshiftmm/listinfo/users>




--

須江 信洋(NOBUHIRO SUE)

SENIOR SOLUTION ARCHITECT

Red Hat K.K. <https://www.redhat.com/>

no...@redhat.com <mailto:no...@redhat.com>

<https://red.ht/sig>




_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to