weird issue with etcd

Julio Saura Fri, 10 Jun 2016 02:57:55 -0700

hello

i have an origin 3.1 installation working cool so far


today one of my etcd nodes ( 1 of 2 ) crashed and i started having problems..

i noticed on one of my master nodes that it was not able to connect to second 
etcd server and that the etcd server was not able to promote as leader..


un 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 is starting a 
new election at term 10048
jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 became 
candidate at term 10049
jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 received 
vote from 12c8a31c8fcae0d4 at term 10049
jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 [logterm: 8, 
index: 4600461] sent vote request to bf80ee3a26e8772c at term 10049
jun 10 11:09:56 openshift-balancer02 etcd[47218]: got unexpected response error 
(etcdserver: request timed out)

my masters logged that they were not able to connect to the etcd

er.go:218] unexpected ListAndWatch error: pkg/storage/cacher.go:161: Failed to 
list *extensions.Job: error #0: dial tcp X.X.X.X:2379: connection refused

so i tried a simple test, just telnet from masters to the etcd node port ..

[root@openshift-master01 log]# telnet X.X.X.X 2379
Trying X.X.X.X...
Connected to X.X.X.X.
Escape character is '^]’

so i was able to connect from masters. 

i was not able to recover my oc masters until the first etcd node rebooted .. 
so it seems my etcd “cluster” is not working without the first node ..

any clue?

thanks


_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

weird issue with etcd

Reply via email to