Re: weird issue with etcd

Scott Dodson Mon, 13 Jun 2016 09:48:17 -0700

Can you verify the connection information etcdClientInfo section in
/etc/origin/master/master-config.yaml is correct?


On Mon, Jun 13, 2016 at 11:56 AM, Julio Saura <[email protected]> wrote:
> hello
>
> yes.. i have a external balancer in front of my masters for HA as doc says.
>
> i don’t have any balancer in front of my etcd servers for masters connection, 
> it’s not necessary right? masters will try all etcd availables it one is down 
> right?
>
> i don’t know why but none of my masters were able to connect to the second 
> etcd instance, but using telnet from their shell worked .. so it was not a 
> net o fw issue..
>
>
> best regards.
>
>> El 13 jun 2016, a las 17:53, Clayton Coleman <[email protected]> escribió:
>>
>> I have not seen that particular issue.  Do you have a load balancer in
>> between your masters and etcd?
>>
>> On Fri, Jun 10, 2016 at 5:55 AM, Julio Saura <[email protected]> wrote:
>>> hello
>>>
>>> i have an origin 3.1 installation working cool so far
>>>
>>> today one of my etcd nodes ( 1 of 2 ) crashed and i started having 
>>> problems..
>>>
>>> i noticed on one of my master nodes that it was not able to connect to 
>>> second etcd server and that the etcd server was not able to promote as 
>>> leader..
>>>
>>>
>>> un 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 is 
>>> starting a new election at term 10048
>>> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 became 
>>> candidate at term 10049
>>> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 received 
>>> vote from 12c8a31c8fcae0d4 at term 10049
>>> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 
>>> [logterm: 8, index: 4600461] sent vote request to bf80ee3a26e8772c at term 
>>> 10049
>>> jun 10 11:09:56 openshift-balancer02 etcd[47218]: got unexpected response 
>>> error (etcdserver: request timed out)
>>>
>>> my masters logged that they were not able to connect to the etcd
>>>
>>> er.go:218] unexpected ListAndWatch error: pkg/storage/cacher.go:161: Failed 
>>> to list *extensions.Job: error #0: dial tcp X.X.X.X:2379: connection refused
>>>
>>> so i tried a simple test, just telnet from masters to the etcd node port ..
>>>
>>> [root@openshift-master01 log]# telnet X.X.X.X 2379
>>> Trying X.X.X.X...
>>> Connected to X.X.X.X.
>>> Escape character is '^]’
>>>
>>> so i was able to connect from masters.
>>>
>>> i was not able to recover my oc masters until the first etcd node rebooted 
>>> .. so it seems my etcd “cluster” is not working without the first node ..
>>>
>>> any clue?
>>>
>>> thanks
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>
> _______________________________________________
> users mailing list
> [email protected]
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: weird issue with etcd

Reply via email to