Can you verify the connection information etcdClientInfo section in /etc/origin/master/master-config.yaml is correct?
On Mon, Jun 13, 2016 at 11:56 AM, Julio Saura <[email protected]> wrote: > hello > > yes.. i have a external balancer in front of my masters for HA as doc says. > > i don’t have any balancer in front of my etcd servers for masters connection, > it’s not necessary right? masters will try all etcd availables it one is down > right? > > i don’t know why but none of my masters were able to connect to the second > etcd instance, but using telnet from their shell worked .. so it was not a > net o fw issue.. > > > best regards. > >> El 13 jun 2016, a las 17:53, Clayton Coleman <[email protected]> escribió: >> >> I have not seen that particular issue. Do you have a load balancer in >> between your masters and etcd? >> >> On Fri, Jun 10, 2016 at 5:55 AM, Julio Saura <[email protected]> wrote: >>> hello >>> >>> i have an origin 3.1 installation working cool so far >>> >>> today one of my etcd nodes ( 1 of 2 ) crashed and i started having >>> problems.. >>> >>> i noticed on one of my master nodes that it was not able to connect to >>> second etcd server and that the etcd server was not able to promote as >>> leader.. >>> >>> >>> un 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 is >>> starting a new election at term 10048 >>> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 became >>> candidate at term 10049 >>> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 received >>> vote from 12c8a31c8fcae0d4 at term 10049 >>> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 >>> [logterm: 8, index: 4600461] sent vote request to bf80ee3a26e8772c at term >>> 10049 >>> jun 10 11:09:56 openshift-balancer02 etcd[47218]: got unexpected response >>> error (etcdserver: request timed out) >>> >>> my masters logged that they were not able to connect to the etcd >>> >>> er.go:218] unexpected ListAndWatch error: pkg/storage/cacher.go:161: Failed >>> to list *extensions.Job: error #0: dial tcp X.X.X.X:2379: connection refused >>> >>> so i tried a simple test, just telnet from masters to the etcd node port .. >>> >>> [root@openshift-master01 log]# telnet X.X.X.X 2379 >>> Trying X.X.X.X... >>> Connected to X.X.X.X. >>> Escape character is '^]’ >>> >>> so i was able to connect from masters. >>> >>> i was not able to recover my oc masters until the first etcd node rebooted >>> .. so it seems my etcd “cluster” is not working without the first node .. >>> >>> any clue? >>> >>> thanks >>> >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users > > > _______________________________________________ > users mailing list > [email protected] > http://lists.openshift.redhat.com/openshiftmm/listinfo/users _______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
