Re: weird issue with etcd

Julio Saura Tue, 14 Jun 2016 00:31:49 -0700

hello

yes is correct .. it was the first thing i checked ..


first master

etcdClientInfo:
 ca: master.etcd-ca.crt
 certFile: master.etcd-client.crt
 keyFile: master.etcd-client.key
 urls:
   - https://openshift-balancer01:2379
   - https://openshift-balancer02:2379


second master

etcdClientInfo:
 ca: master.etcd-ca.crt
 certFile: master.etcd-client.crt
 keyFile: master.etcd-client.key
 urls:
   - https://openshift-balancer01:2379
   - https://openshift-balancer02:2379

dns names resolve in both masters

Best regards and thanks!


> El 13 jun 2016, a las 18:45, Scott Dodson <[email protected]> escribió:
> 
> Can you verify the connection information etcdClientInfo section in
> /etc/origin/master/master-config.yaml is correct?
> 
> On Mon, Jun 13, 2016 at 11:56 AM, Julio Saura <[email protected]> wrote:
>> hello
>> 
>> yes.. i have a external balancer in front of my masters for HA as doc says.
>> 
>> i don’t have any balancer in front of my etcd servers for masters 
>> connection, it’s not necessary right? masters will try all etcd availables 
>> it one is down right?
>> 
>> i don’t know why but none of my masters were able to connect to the second 
>> etcd instance, but using telnet from their shell worked .. so it was not a 
>> net o fw issue..
>> 
>> 
>> best regards.
>> 
>>> El 13 jun 2016, a las 17:53, Clayton Coleman <[email protected]> escribió:
>>> 
>>> I have not seen that particular issue.  Do you have a load balancer in
>>> between your masters and etcd?
>>> 
>>> On Fri, Jun 10, 2016 at 5:55 AM, Julio Saura <[email protected]> wrote:
>>>> hello
>>>> 
>>>> i have an origin 3.1 installation working cool so far
>>>> 
>>>> today one of my etcd nodes ( 1 of 2 ) crashed and i started having 
>>>> problems..
>>>> 
>>>> i noticed on one of my master nodes that it was not able to connect to 
>>>> second etcd server and that the etcd server was not able to promote as 
>>>> leader..
>>>> 
>>>> 
>>>> un 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 is 
>>>> starting a new election at term 10048
>>>> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 became 
>>>> candidate at term 10049
>>>> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 
>>>> received vote from 12c8a31c8fcae0d4 at term 10049
>>>> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 
>>>> [logterm: 8, index: 4600461] sent vote request to bf80ee3a26e8772c at term 
>>>> 10049
>>>> jun 10 11:09:56 openshift-balancer02 etcd[47218]: got unexpected response 
>>>> error (etcdserver: request timed out)
>>>> 
>>>> my masters logged that they were not able to connect to the etcd
>>>> 
>>>> er.go:218] unexpected ListAndWatch error: pkg/storage/cacher.go:161: 
>>>> Failed to list *extensions.Job: error #0: dial tcp X.X.X.X:2379: 
>>>> connection refused
>>>> 
>>>> so i tried a simple test, just telnet from masters to the etcd node port ..
>>>> 
>>>> [root@openshift-master01 log]# telnet X.X.X.X 2379
>>>> Trying X.X.X.X...
>>>> Connected to X.X.X.X.
>>>> Escape character is '^]’
>>>> 
>>>> so i was able to connect from masters.
>>>> 
>>>> i was not able to recover my oc masters until the first etcd node rebooted 
>>>> .. so it seems my etcd “cluster” is not working without the first node ..
>>>> 
>>>> any clue?
>>>> 
>>>> thanks
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users


_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: weird issue with etcd

Reply via email to