etcdctl -C https://openshift-balancer01:2379,https://openshift-balancer02:2379 --ca-file=/etc/origin/master/maer.etcd-ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key member list
12c8a31c8fcae0d4: name=openshift-balancer02 peerURLs=https://XXXX:2380 clientURLs=https://XXXX:2379 bf80ee3a26e8772c: name=openshift-balancer01 peerURLs=https://XXXX:2380 clientURLs=https://XXXX:2379 <https://xxxx:2379/> member list is ok cluster health tells me what i already know :( etcdctl -C https://openshift-balancer01;2379,https://openshift-balancer02:2379 --ca-file=/etc/origin/master/master.etcd-ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key cluster-health member 12c8a31c8fcae0d4 is unhealthy: got unhealthy result from https://XXXX:2379 failed to check the health of member bf80ee3a26e8772c on https://XXXX:2379: Get https://XXXX:2379/health: dial tcp XXXX:2379: i/o timeout member bf80ee3a26e8772c is unreachable: [https://XXXX:2379] are all unreachable the "main etcd" is halted right now Thanks! > El 21 jun 2016, a las 17:45, Julio Saura <[email protected]> escribió: > > regarding the certs, i used ansible to install origin so i guess ansible > should have done it right … > > >> El 21 jun 2016, a las 15:29, Julio Saura <[email protected] >> <mailto:[email protected]>> escribió: >> >> hello >> >> yes, they are synced with and internal NTP server .. >> >> gonna try ectdctl thanks! >> >> >>> El 21 jun 2016, a las 15:20, Jason DeTiberus <[email protected] >>> <mailto:[email protected]>> escribió: >>> >>> On Tue, Jun 21, 2016 at 7:28 AM, Julio Saura <[email protected] >>> <mailto:[email protected]>> wrote: >>>> yes >>>> >>>> working >>>> >>>> [root@openshift-master01 ~]# telnet XXXXX 2380 >>>> Trying XXXX... >>>> Connected to XXXX. >>>> Escape character is '^]'. >>>> ^CConnection closed by foreign host. >>> >>> >>> Have you verified that time is syncd between the hosts? I'd also check >>> the peer certs between the hosts... Can you connect to the hosts using >>> etcdctl? There should be a status command that will give you more >>> information. >>> >>>> >>>> >>>> El 21 jun 2016, a las 13:21, Jason DeTiberus <[email protected] >>>> <mailto:[email protected]>> escribió: >>>> >>>> Did you verify connectivity over the peering port as well (2380)? >>>> >>>> On Jun 21, 2016 7:17 AM, "Julio Saura" <[email protected] >>>> <mailto:[email protected]>> wrote: >>>>> >>>>> hello >>>>> >>>>> same problem >>>>> >>>>> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]: >>>>> F0621 13:11:03.155246 59618 auth.go:141] error #0: dial tcp XXXX:2379: >>>>> connection refused ( the one i rebooted ) >>>>> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]: >>>>> error #1: client: etcd member https://YYYY:2379 <https://yyyy:2379/> has >>>>> no leader >>>>> >>>>> i rebooted the etcd server and my master is not able to use other one >>>>> >>>>> still able to connect from both masters using telnet to the etcd port .. >>>>> >>>>> any clue? this is weird. >>>>> >>>>> >>>>>> El 14 jun 2016, a las 9:28, Julio Saura <[email protected] >>>>>> <mailto:[email protected]>> escribió: >>>>>> >>>>>> hello >>>>>> >>>>>> yes is correct .. it was the first thing i checked .. >>>>>> >>>>>> first master >>>>>> >>>>>> etcdClientInfo: >>>>>> ca: master.etcd-ca.crt >>>>>> certFile: master.etcd-client.crt >>>>>> keyFile: master.etcd-client.key >>>>>> urls: >>>>>> - https://openshift-balancer01:2379 <https://openshift-balancer01:2379/> >>>>>> - https://openshift-balancer02:2379 <https://openshift-balancer02:2379/> >>>>>> >>>>>> >>>>>> second master >>>>>> >>>>>> etcdClientInfo: >>>>>> ca: master.etcd-ca.crt >>>>>> certFile: master.etcd-client.crt >>>>>> keyFile: master.etcd-client.key >>>>>> urls: >>>>>> - https://openshift-balancer01:2379 <https://openshift-balancer01:2379/> >>>>>> - https://openshift-balancer02:2379 <https://openshift-balancer02:2379/> >>>>>> >>>>>> dns names resolve in both masters >>>>>> >>>>>> Best regards and thanks! >>>>>> >>>>>> >>>>>>> El 13 jun 2016, a las 18:45, Scott Dodson <[email protected] >>>>>>> <mailto:[email protected]>> >>>>>>> escribió: >>>>>>> >>>>>>> Can you verify the connection information etcdClientInfo section in >>>>>>> /etc/origin/master/master-config.yaml is correct? >>>>>>> >>>>>>> On Mon, Jun 13, 2016 at 11:56 AM, Julio Saura <[email protected] >>>>>>> <mailto:[email protected]>> >>>>>>> wrote: >>>>>>>> hello >>>>>>>> >>>>>>>> yes.. i have a external balancer in front of my masters for HA as doc >>>>>>>> says. >>>>>>>> >>>>>>>> i don’t have any balancer in front of my etcd servers for masters >>>>>>>> connection, it’s not necessary right? masters will try all etcd >>>>>>>> availables >>>>>>>> it one is down right? >>>>>>>> >>>>>>>> i don’t know why but none of my masters were able to connect to the >>>>>>>> second etcd instance, but using telnet from their shell worked .. so >>>>>>>> it was >>>>>>>> not a net o fw issue.. >>>>>>>> >>>>>>>> >>>>>>>> best regards. >>>>>>>> >>>>>>>>> El 13 jun 2016, a las 17:53, Clayton Coleman <[email protected] >>>>>>>>> <mailto:[email protected]>> >>>>>>>>> escribió: >>>>>>>>> credentials from >>>>>>>>> I have not seen that particular issue. Do you have a load balancer >>>>>>>>> in >>>>>>>>> between your masters and etcd? >>>>>>>>> >>>>>>>>> On Fri, Jun 10, 2016 at 5:55 AM, Julio Saura <[email protected] >>>>>>>>> <mailto:[email protected]>> >>>>>>>>> wrote: >>>>>>>>>> hello >>>>>>>>>> >>>>>>>>>> i have an origin 3.1 installation working cool so far >>>>>>>>>> >>>>>>>>>> today one of my etcd nodes ( 1 of 2 ) crashed and i started having >>>>>>>>>> problems.. >>>>>>>>>> >>>>>>>>>> i noticed on one of my master nodes that it was not able to connect >>>>>>>>>> to second etcd server and that the etcd server was not able to >>>>>>>>>> promote as >>>>>>>>>> leader.. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> un 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 is >>>>>>>>>> starting a new election at term 10048 >>>>>>>>>> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 >>>>>>>>>> became candidate at term 10049 >>>>>>>>>> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 >>>>>>>>>> received vote from 12c8a31c8fcae0d4 at term 10049 >>>>>>>>>> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 >>>>>>>>>> [logterm: 8, index: 4600461] sent vote request to bf80ee3a26e8772c >>>>>>>>>> at term >>>>>>>>>> 10049 >>>>>>>>>> jun 10 11:09:56 openshift-balancer02 etcd[47218]: got unexpected >>>>>>>>>> response error (etcdserver: request timed out) >>>>>>>>>> >>>>>>>>>> my masters logged that they were not able to connect to the etcd >>>>>>>>>> >>>>>>>>>> er.go:218] unexpected ListAndWatch error: pkg/storage/cacher.go:161: >>>>>>>>>> Failed to list *extensions.Job: error #0: dial tcp X.X.X.X:2379: >>>>>>>>>> connection >>>>>>>>>> refused >>>>>>>>>> >>>>>>>>>> so i tried a simple test, just telnet from masters to the etcd node >>>>>>>>>> port .. >>>>>>>>>> >>>>>>>>>> [root@openshift-master01 log]# telnet X.X.X.X 2379 >>>>>>>>>> Trying X.X.X.X... >>>>>>>>>> Connected to X.X.X.X. >>>>>>>>>> Escape character is '^]’ >>>>>>>>>> >>>>>>>>>> so i was able to connect from masters. >>>>>>>>>> >>>>>>>>>> i was not able to recover my oc masters until the first etcd node >>>>>>>>>> rebooted .. so it seems my etcd “cluster” is not working without the >>>>>>>>>> first >>>>>>>>>> node .. >>>>>>>>>> >>>>>>>>>> any clue? >>>>>>>>>> >>>>>>>>>> thanks >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> [email protected] >>>>>>>>>> <mailto:[email protected]> >>>>>>>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>>>>>>>>> <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> [email protected] >>>>>>>> <mailto:[email protected]> >>>>>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>>>>>>> <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> [email protected] >>>>>> <mailto:[email protected]> >>>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>>>>> <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> [email protected] <mailto:[email protected]> >>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>>>> <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> >>>> >>>> >>> >>> >>> >>> -- >>> Jason DeTiberus >> >> _______________________________________________ >> users mailing list >> [email protected] <mailto:[email protected]> >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users > > _______________________________________________ > users mailing list > [email protected] > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
