Re: prune tags

2016-06-21 Thread Clayton Coleman
Prune definitely knows that.  We also know the time the tag was last
updated.

On Jun 21, 2016, at 8:57 PM, Philippe Lafoucrière <
philippe.lafoucri...@tech-angels.com> wrote:


On Tue, Jun 21, 2016 at 2:33 PM, Clayton Coleman 
wrote:

> I don't think we have anything to prune tags (since we don't know what
> tag names to prune).  We'd need some way of knowing the tag was "not
> important" before it could be pruned.
>

I would say, if no stream nor container is using a specific tag?
I admit it's a bit harder than just untagged layers for images :(
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: prune tags

2016-06-21 Thread Philippe Lafoucrière
On Tue, Jun 21, 2016 at 2:33 PM, Clayton Coleman 
wrote:

> I don't think we have anything to prune tags (since we don't know what
> tag names to prune).  We'd need some way of knowing the tag was "not
> important" before it could be pruned.
>

I would say, if no stream nor container is using a specific tag?
I admit it's a bit harder than just untagged layers for images :(
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: All my ipfailover pods are in "Entering MASTER STATE", it's not fair ?

2016-06-21 Thread Ram Ranganathan
Couldn't figure out if you have a problem or not  (or it was just a
question) from the email thread.

What does "ip addr show" on all the nodes show?  This is the nodes where
your ipfailover pods are running.
Are the VIPs allocated to both nodes (assuming you have from the logs),
then it is likely some of the VRRP instances would be in master state.

If that is not the case, then can you please check if multicast is enabled
and traffic (firewall/iptables) is allowed for/to 224.0.0.18

Thanks,

Ram//

On Mon, Jun 20, 2016 at 8:14 AM, Stéphane Klein  wrote:

> In this documentation
> https://github.com/acassen/keepalived/blob/master/doc/source/case_study_failover.rst#architecture-specification
> I read:
>
> « 4 VRRP Instances per LVS director: 2 VRRP Instance in the MASTER state
> and 2 in BACKUP state. We use a symmetric state on each LVS directors. »
>
> then I think it's normal to have many Master instances.
>
> 2016-06-20 9:35 GMT+02:00 Stéphane Klein :
>
>> I would like to say "It's a problem, it's abnormal ?"
>>
>> 2016-06-17 16:26 GMT+02:00 Stéphane Klein :
>>
>>> Hi,
>>>
>>> I've:
>>>
>>> * one cluster with 2 nodes
>>> * ipfailover replicas=2
>>>
>>> I execute:
>>>
>>> * oc logs ipfailover-rbx-1-bh3kn
>>> https://gist.github.com/harobed/2ab152ed98f95285d549cbc7af3a#file-oc-logs-ipfailover-rbx-1-bh3kn
>>> * oc logs ipfailover-rbx-1-mmp36
>>> https://gist.github.com/harobed/2ab152ed98f95285d549cbc7af3a#file-oc-logs-ipfailover-rbx-1-mmp36
>>>
>>> and I see that all ipfailover pod are in "Entering MASTER STATE".
>>>
>>> It's not fair ?
>>>
>>> Best regards,
>>> Stéphane
>>>
>>
>>
>>
>> --
>> Stéphane Klein 
>> blog: http://stephane-klein.info
>> cv : http://cv.stephane-klein.info
>> Twitter: http://twitter.com/klein_stephane
>>
>
>
>
> --
> Stéphane Klein 
> blog: http://stephane-klein.info
> cv : http://cv.stephane-klein.info
> Twitter: http://twitter.com/klein_stephane
>
> ___
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>


-- 
Ram//
main(O,s){s=--O;10>4*s)*(O++?-1:1):10)&&\
main(++O,s++);}
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: prune tags

2016-06-21 Thread Clayton Coleman
I don't think we have anything to prune tags (since we don't know what
tag names to prune).  We'd need some way of knowing the tag was "not
important" before it could be pruned.

On Tue, Jun 21, 2016 at 2:21 PM, Philippe Lafoucrière
 wrote:
> Hi,
>
> As we're using openshift for continuous deployments, we often have a lot of
> tags per ImageStream (to test branches). We would love to see these unused
> tags pruned, like other objects.
> Is there something in the roadmap?
>
> Thanks,
> Philippe
>
> --
> Philippe Lafoucrière - CEO
> http://www.tech-angels.com
> https://gemnasium.com
> main : +33 (0) 970 444 643
> mobile CA: +1 (581) 986-7540
> mobile FR: +33 (0) 6 72 63 75 40
>
>
> ___
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


prune tags

2016-06-21 Thread Philippe Lafoucrière
Hi,

As we're using openshift for continuous deployments, we often have a lot of
tags per ImageStream (to test branches). We would love to see these unused
tags pruned, like other objects.
Is there something in the roadmap?

Thanks,
Philippe

-- 
Philippe Lafoucrière - CEO
http://www.tech-angels.com
https://gemnasium.com
main : +33 (0) 970 444 643
mobile CA: +1 (581) 986-7540
mobile FR: +33 (0) 6 72 63 75 40
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


RE: MySQL: Readiness probe failed

2016-06-21 Thread Den Cowboy
So it seems the mysql isn't able to resolve the service name mysql to its IP:
because this works inside the container:
mysql -u user -h 172.30.183.216 -p
password: xx
---> mysql (works)

This doesn't work:
mysql -u user h mysql -p
password: xx
--> error unkown mysql server host 'mysql'

mysql is the service name and service IP is 172.30.183.216
here are all the logs of the container:

Warning: Can't detect memory limit from cgroups
Warning: Can't detect number of CPU cores from cgroups
---> 16:19:51 Processing MySQL configuration files ...
---> 16:19:51 Initializing database ...
---> 16:19:51 Running mysql_install_db ...
2016-06-21 16:19:51 0 [Warning] TIMESTAMP with implicit DEFAULT value is 
deprecated. Please use --explicit_defaults_for_timestamp server option (see 
documentation for more details).
2016-06-21 16:19:51 0 [Note] /opt/rh/rh-mysql56/root/usr/libexec/mysqld (mysqld 
5.6.26) starting as process 17 ...
2016-06-21 16:19:51 663521f0b840 InnoDB: Warning: Using 
innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in 
future releases, together with the option innodb_use_sys_malloc and with the 
InnoDB's internal memory allocator.
2016-06-21 16:19:51 17 [Note] InnoDB: Using atomics to ref count buffer pool 
pages
2016-06-21 16:19:51 17 [Note] InnoDB: The InnoDB memory heap is disabled
2016-06-21 16:19:51 17 [Note] InnoDB: Mutexes and rw_locks use GCC atomic 
builtins
2016-06-21 16:19:51 17 [Note] InnoDB: Memory barrier is not used
2016-06-21 16:19:51 17 [Note] InnoDB: Compressed tables use zlib 1.2.7
2016-06-21 16:19:51 17 [Note] InnoDB: Using Linux native AIO
2016-06-21 16:19:51 17 [Note] InnoDB: Using CPU crc32 instructions
2016-06-21 16:19:51 17 [Note] InnoDB: Initializing buffer pool, size = 32.0M
2016-06-21 16:19:51 17 [Note] InnoDB: Completed initialization of buffer pool
2016-06-21 16:19:51 17 [Note] InnoDB: The first specified data file ./ibdata1 
did not exist: a new database to be created!
2016-06-21 16:19:51 17 [Note] InnoDB: Setting file ./ibdata1 size to 12 MB
2016-06-21 16:19:51 17 [Note] InnoDB: Database physically writes the file full: 
wait...
2016-06-21 16:19:51 17 [Note] InnoDB: Setting log file ./ib_logfile101 size to 
8 MB
2016-06-21 16:19:52 17 [Note] InnoDB: Setting log file ./ib_logfile1 size to 8 
MB
2016-06-21 16:19:52 17 [Note] InnoDB: Renaming log file ./ib_logfile101 to 
./ib_logfile0
2016-06-21 16:19:52 17 [Warning] InnoDB: New log files created, LSN=45781
2016-06-21 16:19:52 17 [Note] InnoDB: Doublewrite buffer not found: creating new
2016-06-21 16:19:52 17 [Note] InnoDB: Doublewrite buffer created
2016-06-21 16:19:52 17 [Note] InnoDB: 128 rollback segment(s) are active.
2016-06-21 16:19:52 17 [Warning] InnoDB: Creating foreign key constraint system 
tables.
2016-06-21 16:19:52 17 [Note] InnoDB: Foreign key constraint system tables 
created
2016-06-21 16:19:52 17 [Note] InnoDB: Creating tablespace and datafile system 
tables.
2016-06-21 16:19:52 17 [Note] InnoDB: Tablespace and datafile system tables 
created.
2016-06-21 16:19:52 17 [Note] InnoDB: Waiting for purge to start
2016-06-21 16:19:52 17 [Note] InnoDB: 5.6.26 started; log sequence number 0
2016-06-21 16:19:52 17 [Note] RSA private key file not found: 
/var/lib/mysql/data//private_key.pem. Some authentication plugins will not work.
2016-06-21 16:19:52 17 [Note] RSA public key file not found: 
/var/lib/mysql/data//public_key.pem. Some authentication plugins will not work.
2016-06-21 16:19:55 17 [Note] Binlog end
2016-06-21 16:19:55 17 [Note] InnoDB: FTS optimize thread exiting.
2016-06-21 16:19:55 17 [Note] InnoDB: Starting shutdown...
2016-06-21 16:19:56 17 [Note] InnoDB: Shutdown completed; log sequence number 
1625977


2016-06-21 16:19:56 0 [Warning] TIMESTAMP with implicit DEFAULT value is 
deprecated. Please use --explicit_defaults_for_timestamp server option (see 
documentation for more details).
2016-06-21 16:19:56 0 [Note] /opt/rh/rh-mysql56/root/usr/libexec/mysqld (mysqld 
5.6.26) starting as process 39 ...
2016-06-21 16:19:56 6ef05c1a6840 InnoDB: Warning: Using 
innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in 
future releases, together with the option innodb_use_sys_malloc and with the 
InnoDB's internal memory allocator.
2016-06-21 16:19:56 39 [Note] InnoDB: Using atomics to ref count buffer pool 
pages
2016-06-21 16:19:56 39 [Note] InnoDB: The InnoDB memory heap is disabled
2016-06-21 16:19:56 39 [Note] InnoDB: Mutexes and rw_locks use GCC atomic 
builtins
2016-06-21 16:19:56 39 [Note] InnoDB: Memory barrier is not used
2016-06-21 16:19:56 39 [Note] InnoDB: Compressed tables use zlib 1.2.7
2016-06-21 16:19:56 39 [Note] InnoDB: Using Linux native AIO
2016-06-21 16:19:56 39 [Note] InnoDB: Using CPU crc32 instructions
2016-06-21 16:19:56 39 [Note] InnoDB: Initializing buffer pool, size = 32.0M
2016-06-21 16:19:56 39 [Note] InnoDB: Completed initialization of buffer pool
2016-06-21 16:19:56 39 [Note] InnoDB: Highest supported file format is 

Re: weird issue with etcd

2016-06-21 Thread Julio Saura
etcdctl -C https://openshift-balancer01:2379,https://openshift-balancer02:2379 
--ca-file=/etc/origin/master/maer.etcd-ca.crt 
--cert-file=/etc/origin/master/master.etcd-client.crt 
--key-file=/etc/origin/master/master.etcd-client.key member list


12c8a31c8fcae0d4: name=openshift-balancer02 peerURLs=https://:2380 
clientURLs=https://:2379
bf80ee3a26e8772c: name=openshift-balancer01 peerURLs=https://:2380 
clientURLs=https://:2379 



member list is ok

cluster health tells me what i already know :(

etcdctl -C https://openshift-balancer01;2379,https://openshift-balancer02:2379 
--ca-file=/etc/origin/master/master.etcd-ca.crt 
--cert-file=/etc/origin/master/master.etcd-client.crt 
--key-file=/etc/origin/master/master.etcd-client.key cluster-health

member 12c8a31c8fcae0d4 is unhealthy: got unhealthy result from 
https://:2379
failed to check the health of member bf80ee3a26e8772c on https://:2379: Get 
https://:2379/health: dial tcp :2379: i/o timeout
member bf80ee3a26e8772c is unreachable: [https://:2379] are all unreachable

the "main etcd" is halted right now 

Thanks!





> El 21 jun 2016, a las 17:45, Julio Saura  escribió:
> 
> regarding the certs, i used ansible to install origin so i guess ansible 
> should have done it right …
> 
> 
>> El 21 jun 2016, a las 15:29, Julio Saura > > escribió:
>> 
>> hello
>> 
>> yes, they are synced with and internal NTP server .. 
>> 
>> gonna try ectdctl thanks!
>> 
>> 
>>> El 21 jun 2016, a las 15:20, Jason DeTiberus >> > escribió:
>>> 
>>> On Tue, Jun 21, 2016 at 7:28 AM, Julio Saura >> > wrote:
 yes
 
 working
 
 [root@openshift-master01 ~]# telnet X 2380
 Trying ...
 Connected to .
 Escape character is '^]'.
 ^CConnection closed by foreign host.
>>> 
>>> 
>>> Have you verified that time is syncd between the hosts? I'd also check
>>> the peer certs between the hosts... Can you connect to the hosts using
>>> etcdctl? There should be a status command that will give you more
>>> information.
>>> 
 
 
 El 21 jun 2016, a las 13:21, Jason DeTiberus > escribió:
 
 Did you verify connectivity over the peering port as well (2380)?
 
 On Jun 21, 2016 7:17 AM, "Julio Saura" > wrote:
> 
> hello
> 
> same problem
> 
> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
> F0621 13:11:03.155246   59618 auth.go:141] error #0: dial tcp :2379:
> connection refused ( the one i rebooted )
> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
> error #1: client: etcd member https://:2379  has 
> no leader
> 
> i rebooted the etcd server and my master is not able to use other one
> 
> still able to connect from both masters using telnet to the etcd port ..
> 
> any clue? this is weird.
> 
> 
>> El 14 jun 2016, a las 9:28, Julio Saura > > escribió:
>> 
>> hello
>> 
>> yes is correct .. it was the first thing i checked ..
>> 
>> first master
>> 
>> etcdClientInfo:
>> ca: master.etcd-ca.crt
>> certFile: master.etcd-client.crt
>> keyFile: master.etcd-client.key
>> urls:
>>  - https://openshift-balancer01:2379 
>>  - https://openshift-balancer02:2379 
>> 
>> 
>> second master
>> 
>> etcdClientInfo:
>> ca: master.etcd-ca.crt
>> certFile: master.etcd-client.crt
>> keyFile: master.etcd-client.key
>> urls:
>>  - https://openshift-balancer01:2379 
>>  - https://openshift-balancer02:2379 
>> 
>> dns names resolve in both masters
>> 
>> Best regards and thanks!
>> 
>> 
>>> El 13 jun 2016, a las 18:45, Scott Dodson >> >
>>> escribió:
>>> 
>>> Can you verify the connection information etcdClientInfo section in
>>> /etc/origin/master/master-config.yaml is correct?
>>> 
>>> On Mon, Jun 13, 2016 at 11:56 AM, Julio Saura >> >
>>> wrote:
 hello
 
 yes.. i have a external balancer in front of my masters for HA as doc
 says.
 
 i don’t have any balancer in front of my etcd servers for masters
 connection, it’s not necessary right? masters will try all etcd 
 availables
 it one is down right?
 
 i don’t know why but none of 

Re: weird issue with etcd

2016-06-21 Thread Julio Saura
regarding the certs, i used ansible to install origin so i guess ansible should 
have done it right …


> El 21 jun 2016, a las 15:29, Julio Saura  escribió:
> 
> hello
> 
> yes, they are synced with and internal NTP server .. 
> 
> gonna try ectdctl thanks!
> 
> 
>> El 21 jun 2016, a las 15:20, Jason DeTiberus > > escribió:
>> 
>> On Tue, Jun 21, 2016 at 7:28 AM, Julio Saura > > wrote:
>>> yes
>>> 
>>> working
>>> 
>>> [root@openshift-master01 ~]# telnet X 2380
>>> Trying ...
>>> Connected to .
>>> Escape character is '^]'.
>>> ^CConnection closed by foreign host.
>> 
>> 
>> Have you verified that time is syncd between the hosts? I'd also check
>> the peer certs between the hosts... Can you connect to the hosts using
>> etcdctl? There should be a status command that will give you more
>> information.
>> 
>>> 
>>> 
>>> El 21 jun 2016, a las 13:21, Jason DeTiberus >> > escribió:
>>> 
>>> Did you verify connectivity over the peering port as well (2380)?
>>> 
>>> On Jun 21, 2016 7:17 AM, "Julio Saura" >> > wrote:
 
 hello
 
 same problem
 
 jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
 F0621 13:11:03.155246   59618 auth.go:141] error #0: dial tcp :2379:
 connection refused ( the one i rebooted )
 jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
 error #1: client: etcd member https://:2379  has 
 no leader
 
 i rebooted the etcd server and my master is not able to use other one
 
 still able to connect from both masters using telnet to the etcd port ..
 
 any clue? this is weird.
 
 
> El 14 jun 2016, a las 9:28, Julio Saura  > escribió:
> 
> hello
> 
> yes is correct .. it was the first thing i checked ..
> 
> first master
> 
> etcdClientInfo:
> ca: master.etcd-ca.crt
> certFile: master.etcd-client.crt
> keyFile: master.etcd-client.key
> urls:
>  - https://openshift-balancer01:2379 
>  - https://openshift-balancer02:2379 
> 
> 
> second master
> 
> etcdClientInfo:
> ca: master.etcd-ca.crt
> certFile: master.etcd-client.crt
> keyFile: master.etcd-client.key
> urls:
>  - https://openshift-balancer01:2379 
>  - https://openshift-balancer02:2379 
> 
> dns names resolve in both masters
> 
> Best regards and thanks!
> 
> 
>> El 13 jun 2016, a las 18:45, Scott Dodson > >
>> escribió:
>> 
>> Can you verify the connection information etcdClientInfo section in
>> /etc/origin/master/master-config.yaml is correct?
>> 
>> On Mon, Jun 13, 2016 at 11:56 AM, Julio Saura > >
>> wrote:
>>> hello
>>> 
>>> yes.. i have a external balancer in front of my masters for HA as doc
>>> says.
>>> 
>>> i don’t have any balancer in front of my etcd servers for masters
>>> connection, it’s not necessary right? masters will try all etcd 
>>> availables
>>> it one is down right?
>>> 
>>> i don’t know why but none of my masters were able to connect to the
>>> second etcd instance, but using telnet from their shell worked .. so it 
>>> was
>>> not a net o fw issue..
>>> 
>>> 
>>> best regards.
>>> 
 El 13 jun 2016, a las 17:53, Clayton Coleman >
 escribió:
 credentials from
 I have not seen that particular issue.  Do you have a load balancer
 in
 between your masters and etcd?
 
 On Fri, Jun 10, 2016 at 5:55 AM, Julio Saura >
 wrote:
> hello
> 
> i have an origin 3.1 installation working cool so far
> 
> today one of my etcd nodes ( 1 of 2 ) crashed and i started having
> problems..
> 
> i noticed on one of my master nodes that it was not able to connect
> to second etcd server and that the etcd server was not able to 
> promote as
> leader..
> 
> 
> un 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 is
> starting a new election at term 10048
> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
> became candidate at term 10049
> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 

Re: MySQL: Readiness probe failed

2016-06-21 Thread Philippe Lafoucrière
Have you tried to raise initialDelaySeconds ?
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: MySQL: Readiness probe failed

2016-06-21 Thread Martin Nagy
Ah, ignore that, I didn't realize the warnings about ioctl are irrelevant.

On Tue, Jun 21, 2016 at 4:56 PM, Martin Nagy  wrote:
> Are you referring to templates in origin/examples/db-templates?
> Can you edit the template and remove the '-i' passed to /bin/sh
> and see if it helps?
>
> On Tue, Jun 21, 2016 at 4:44 PM, Den Cowboy  wrote:
>> I'm using the MySQL template and started it with the right environment
>> variables
>> The MySQL is running fine but I got this error and I'm not able to access my
>> mysql on its service name:
>>
>> mysql -u myuser -h mysql -p
>> password:xxx
>>
>>
>> Readiness probe failed: sh: cannot set terminal process group (-1):
>> Inappropriate ioctl for device sh: no job control in this shell ERROR 2003
>> (HY000): Can't connect to MySQL server on '127.0.0.1' (111)
>>
>>
>> ___
>> users mailing list
>> users@lists.openshift.redhat.com
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: MySQL: Readiness probe failed

2016-06-21 Thread Martin Nagy
Are you referring to templates in origin/examples/db-templates?
Can you edit the template and remove the '-i' passed to /bin/sh
and see if it helps?

On Tue, Jun 21, 2016 at 4:44 PM, Den Cowboy  wrote:
> I'm using the MySQL template and started it with the right environment
> variables
> The MySQL is running fine but I got this error and I'm not able to access my
> mysql on its service name:
>
> mysql -u myuser -h mysql -p
> password:xxx
>
>
> Readiness probe failed: sh: cannot set terminal process group (-1):
> Inappropriate ioctl for device sh: no job control in this shell ERROR 2003
> (HY000): Can't connect to MySQL server on '127.0.0.1' (111)
>
>
> ___
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


MySQL: Readiness probe failed

2016-06-21 Thread Den Cowboy
I'm using the MySQL template and started it with the right environment variables
The MySQL is running fine but I got this error and I'm not able to access my 
mysql on its service name:

mysql -u myuser -h mysql -p
password:xxx


Readiness probe failed: sh: cannot set terminal process group (-1): 
Inappropriate ioctl for device
sh: no job control in this shell
ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111)

  ___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: weird issue with etcd

2016-06-21 Thread Julio Saura
hello

yes only have two .. i know 3 is the number but i guessed that it might also 
work with 2 :/

yes both etcd servers can connect between them with the peer port .. i have 
also checked it ..

so maybe is because i only have three etcd?

thanks scott!!


> El 21 jun 2016, a las 15:19, Scott Dodson  escribió:
> 
> Julio,
> 
> First, it looks like you've only got two etcd hosts, in order to
> tolerate failure of a single host you'll want three.
> From your master config it looks like your two etcd hosts are
> openshift-balancer01 and openshift-balancer02, can each of those hosts
> connect to each other on port 2380? They will connect directly to each
> other for clustering purposes, then the masters will connect to each
> of the etcd hosts on port 2379 for client connectivity.
> 
> --
> Scott
> 
> On Tue, Jun 21, 2016 at 7:28 AM, Julio Saura  wrote:
>> yes
>> 
>> working
>> 
>> [root@openshift-master01 ~]# telnet X 2380
>> Trying ...
>> Connected to .
>> Escape character is '^]'.
>> ^CConnection closed by foreign host.
>> 
>> 
>> El 21 jun 2016, a las 13:21, Jason DeTiberus  escribió:
>> 
>> Did you verify connectivity over the peering port as well (2380)?
>> 
>> On Jun 21, 2016 7:17 AM, "Julio Saura"  wrote:
>>> 
>>> hello
>>> 
>>> same problem
>>> 
>>> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
>>> F0621 13:11:03.155246   59618 auth.go:141] error #0: dial tcp :2379:
>>> connection refused ( the one i rebooted )
>>> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
>>> error #1: client: etcd member https://:2379 has no leader
>>> 
>>> i rebooted the etcd server and my master is not able to use other one
>>> 
>>> still able to connect from both masters using telnet to the etcd port ..
>>> 
>>> any clue? this is weird.
>>> 
>>> 
 El 14 jun 2016, a las 9:28, Julio Saura  escribió:
 
 hello
 
 yes is correct .. it was the first thing i checked ..
 
 first master
 
 etcdClientInfo:
 ca: master.etcd-ca.crt
 certFile: master.etcd-client.crt
 keyFile: master.etcd-client.key
 urls:
  - https://openshift-balancer01:2379
  - https://openshift-balancer02:2379
 
 
 second master
 
 etcdClientInfo:
 ca: master.etcd-ca.crt
 certFile: master.etcd-client.crt
 keyFile: master.etcd-client.key
 urls:
  - https://openshift-balancer01:2379
  - https://openshift-balancer02:2379
 
 dns names resolve in both masters
 
 Best regards and thanks!
 
 
> El 13 jun 2016, a las 18:45, Scott Dodson 
> escribió:
> 
> Can you verify the connection information etcdClientInfo section in
> /etc/origin/master/master-config.yaml is correct?
> 
> On Mon, Jun 13, 2016 at 11:56 AM, Julio Saura 
> wrote:
>> hello
>> 
>> yes.. i have a external balancer in front of my masters for HA as doc
>> says.
>> 
>> i don’t have any balancer in front of my etcd servers for masters
>> connection, it’s not necessary right? masters will try all etcd 
>> availables
>> it one is down right?
>> 
>> i don’t know why but none of my masters were able to connect to the
>> second etcd instance, but using telnet from their shell worked .. so it 
>> was
>> not a net o fw issue..
>> 
>> 
>> best regards.
>> 
>>> El 13 jun 2016, a las 17:53, Clayton Coleman 
>>> escribió:
>>> 
>>> I have not seen that particular issue.  Do you have a load balancer
>>> in
>>> between your masters and etcd?
>>> 
>>> On Fri, Jun 10, 2016 at 5:55 AM, Julio Saura 
>>> wrote:
 hello
 
 i have an origin 3.1 installation working cool so far
 
 today one of my etcd nodes ( 1 of 2 ) crashed and i started having
 problems..
 
 i noticed on one of my master nodes that it was not able to connect
 to second etcd server and that the etcd server was not able to promote 
 as
 leader..
 
 
 un 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 is
 starting a new election at term 10048
 jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
 became candidate at term 10049
 jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
 received vote from 12c8a31c8fcae0d4 at term 10049
 jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
 [logterm: 8, index: 4600461] sent vote request to bf80ee3a26e8772c at 
 term
 10049
 jun 10 11:09:56 openshift-balancer02 etcd[47218]: got unexpected
 response error (etcdserver: request timed out)
 
 my 

Re: weird issue with etcd

2016-06-21 Thread Julio Saura
hello

yes, they are synced with and internal NTP server .. 

gonna try ectdctl thanks!


> El 21 jun 2016, a las 15:20, Jason DeTiberus  escribió:
> 
> On Tue, Jun 21, 2016 at 7:28 AM, Julio Saura  > wrote:
>> yes
>> 
>> working
>> 
>> [root@openshift-master01 ~]# telnet X 2380
>> Trying ...
>> Connected to .
>> Escape character is '^]'.
>> ^CConnection closed by foreign host.
> 
> 
> Have you verified that time is syncd between the hosts? I'd also check
> the peer certs between the hosts... Can you connect to the hosts using
> etcdctl? There should be a status command that will give you more
> information.
> 
>> 
>> 
>> El 21 jun 2016, a las 13:21, Jason DeTiberus  escribió:
>> 
>> Did you verify connectivity over the peering port as well (2380)?
>> 
>> On Jun 21, 2016 7:17 AM, "Julio Saura"  wrote:
>>> 
>>> hello
>>> 
>>> same problem
>>> 
>>> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
>>> F0621 13:11:03.155246   59618 auth.go:141] error #0: dial tcp :2379:
>>> connection refused ( the one i rebooted )
>>> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
>>> error #1: client: etcd member https://:2379 has no leader
>>> 
>>> i rebooted the etcd server and my master is not able to use other one
>>> 
>>> still able to connect from both masters using telnet to the etcd port ..
>>> 
>>> any clue? this is weird.
>>> 
>>> 
 El 14 jun 2016, a las 9:28, Julio Saura  escribió:
 
 hello
 
 yes is correct .. it was the first thing i checked ..
 
 first master
 
 etcdClientInfo:
 ca: master.etcd-ca.crt
 certFile: master.etcd-client.crt
 keyFile: master.etcd-client.key
 urls:
  - https://openshift-balancer01:2379
  - https://openshift-balancer02:2379
 
 
 second master
 
 etcdClientInfo:
 ca: master.etcd-ca.crt
 certFile: master.etcd-client.crt
 keyFile: master.etcd-client.key
 urls:
  - https://openshift-balancer01:2379
  - https://openshift-balancer02:2379
 
 dns names resolve in both masters
 
 Best regards and thanks!
 
 
> El 13 jun 2016, a las 18:45, Scott Dodson 
> escribió:
> 
> Can you verify the connection information etcdClientInfo section in
> /etc/origin/master/master-config.yaml is correct?
> 
> On Mon, Jun 13, 2016 at 11:56 AM, Julio Saura 
> wrote:
>> hello
>> 
>> yes.. i have a external balancer in front of my masters for HA as doc
>> says.
>> 
>> i don’t have any balancer in front of my etcd servers for masters
>> connection, it’s not necessary right? masters will try all etcd 
>> availables
>> it one is down right?
>> 
>> i don’t know why but none of my masters were able to connect to the
>> second etcd instance, but using telnet from their shell worked .. so it 
>> was
>> not a net o fw issue..
>> 
>> 
>> best regards.
>> 
>>> El 13 jun 2016, a las 17:53, Clayton Coleman 
>>> escribió:
>>> credentials from
>>> I have not seen that particular issue.  Do you have a load balancer
>>> in
>>> between your masters and etcd?
>>> 
>>> On Fri, Jun 10, 2016 at 5:55 AM, Julio Saura 
>>> wrote:
 hello
 
 i have an origin 3.1 installation working cool so far
 
 today one of my etcd nodes ( 1 of 2 ) crashed and i started having
 problems..
 
 i noticed on one of my master nodes that it was not able to connect
 to second etcd server and that the etcd server was not able to promote 
 as
 leader..
 
 
 un 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 is
 starting a new election at term 10048
 jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
 became candidate at term 10049
 jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
 received vote from 12c8a31c8fcae0d4 at term 10049
 jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
 [logterm: 8, index: 4600461] sent vote request to bf80ee3a26e8772c at 
 term
 10049
 jun 10 11:09:56 openshift-balancer02 etcd[47218]: got unexpected
 response error (etcdserver: request timed out)
 
 my masters logged that they were not able to connect to the etcd
 
 er.go:218] unexpected ListAndWatch error: pkg/storage/cacher.go:161:
 Failed to list *extensions.Job: error #0: dial tcp X.X.X.X:2379: 
 connection
 refused
 
 so i tried a simple test, just telnet from masters to the etcd node
 port ..
 

Re: weird issue with etcd

2016-06-21 Thread Jason DeTiberus
On Tue, Jun 21, 2016 at 7:28 AM, Julio Saura  wrote:
> yes
>
> working
>
> [root@openshift-master01 ~]# telnet X 2380
> Trying ...
> Connected to .
> Escape character is '^]'.
> ^CConnection closed by foreign host.


Have you verified that time is syncd between the hosts? I'd also check
the peer certs between the hosts... Can you connect to the hosts using
etcdctl? There should be a status command that will give you more
information.

>
>
> El 21 jun 2016, a las 13:21, Jason DeTiberus  escribió:
>
> Did you verify connectivity over the peering port as well (2380)?
>
> On Jun 21, 2016 7:17 AM, "Julio Saura"  wrote:
>>
>> hello
>>
>> same problem
>>
>> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
>> F0621 13:11:03.155246   59618 auth.go:141] error #0: dial tcp :2379:
>> connection refused ( the one i rebooted )
>> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
>> error #1: client: etcd member https://:2379 has no leader
>>
>> i rebooted the etcd server and my master is not able to use other one
>>
>> still able to connect from both masters using telnet to the etcd port ..
>>
>> any clue? this is weird.
>>
>>
>> > El 14 jun 2016, a las 9:28, Julio Saura  escribió:
>> >
>> > hello
>> >
>> > yes is correct .. it was the first thing i checked ..
>> >
>> > first master
>> >
>> > etcdClientInfo:
>> > ca: master.etcd-ca.crt
>> > certFile: master.etcd-client.crt
>> > keyFile: master.etcd-client.key
>> > urls:
>> >   - https://openshift-balancer01:2379
>> >   - https://openshift-balancer02:2379
>> >
>> >
>> > second master
>> >
>> > etcdClientInfo:
>> > ca: master.etcd-ca.crt
>> > certFile: master.etcd-client.crt
>> > keyFile: master.etcd-client.key
>> > urls:
>> >   - https://openshift-balancer01:2379
>> >   - https://openshift-balancer02:2379
>> >
>> > dns names resolve in both masters
>> >
>> > Best regards and thanks!
>> >
>> >
>> >> El 13 jun 2016, a las 18:45, Scott Dodson 
>> >> escribió:
>> >>
>> >> Can you verify the connection information etcdClientInfo section in
>> >> /etc/origin/master/master-config.yaml is correct?
>> >>
>> >> On Mon, Jun 13, 2016 at 11:56 AM, Julio Saura 
>> >> wrote:
>> >>> hello
>> >>>
>> >>> yes.. i have a external balancer in front of my masters for HA as doc
>> >>> says.
>> >>>
>> >>> i don’t have any balancer in front of my etcd servers for masters
>> >>> connection, it’s not necessary right? masters will try all etcd 
>> >>> availables
>> >>> it one is down right?
>> >>>
>> >>> i don’t know why but none of my masters were able to connect to the
>> >>> second etcd instance, but using telnet from their shell worked .. so it 
>> >>> was
>> >>> not a net o fw issue..
>> >>>
>> >>>
>> >>> best regards.
>> >>>
>>  El 13 jun 2016, a las 17:53, Clayton Coleman 
>>  escribió:
>> credentials from
>>  I have not seen that particular issue.  Do you have a load balancer
>>  in
>>  between your masters and etcd?
>> 
>>  On Fri, Jun 10, 2016 at 5:55 AM, Julio Saura 
>>  wrote:
>> > hello
>> >
>> > i have an origin 3.1 installation working cool so far
>> >
>> > today one of my etcd nodes ( 1 of 2 ) crashed and i started having
>> > problems..
>> >
>> > i noticed on one of my master nodes that it was not able to connect
>> > to second etcd server and that the etcd server was not able to promote 
>> > as
>> > leader..
>> >
>> >
>> > un 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 is
>> > starting a new election at term 10048
>> > jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
>> > became candidate at term 10049
>> > jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
>> > received vote from 12c8a31c8fcae0d4 at term 10049
>> > jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
>> > [logterm: 8, index: 4600461] sent vote request to bf80ee3a26e8772c at 
>> > term
>> > 10049
>> > jun 10 11:09:56 openshift-balancer02 etcd[47218]: got unexpected
>> > response error (etcdserver: request timed out)
>> >
>> > my masters logged that they were not able to connect to the etcd
>> >
>> > er.go:218] unexpected ListAndWatch error: pkg/storage/cacher.go:161:
>> > Failed to list *extensions.Job: error #0: dial tcp X.X.X.X:2379: 
>> > connection
>> > refused
>> >
>> > so i tried a simple test, just telnet from masters to the etcd node
>> > port ..
>> >
>> > [root@openshift-master01 log]# telnet X.X.X.X 2379
>> > Trying X.X.X.X...
>> > Connected to X.X.X.X.
>> > Escape character is '^]’
>> >
>> > so i was able to connect from masters.
>> >
>> > i was not able to recover my oc masters until the first etcd node
>> 

Re: weird issue with etcd

2016-06-21 Thread Scott Dodson
Julio,

First, it looks like you've only got two etcd hosts, in order to
tolerate failure of a single host you'll want three.
>From your master config it looks like your two etcd hosts are
openshift-balancer01 and openshift-balancer02, can each of those hosts
connect to each other on port 2380? They will connect directly to each
other for clustering purposes, then the masters will connect to each
of the etcd hosts on port 2379 for client connectivity.

--
Scott

On Tue, Jun 21, 2016 at 7:28 AM, Julio Saura  wrote:
> yes
>
> working
>
> [root@openshift-master01 ~]# telnet X 2380
> Trying ...
> Connected to .
> Escape character is '^]'.
> ^CConnection closed by foreign host.
>
>
> El 21 jun 2016, a las 13:21, Jason DeTiberus  escribió:
>
> Did you verify connectivity over the peering port as well (2380)?
>
> On Jun 21, 2016 7:17 AM, "Julio Saura"  wrote:
>>
>> hello
>>
>> same problem
>>
>> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
>> F0621 13:11:03.155246   59618 auth.go:141] error #0: dial tcp :2379:
>> connection refused ( the one i rebooted )
>> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
>> error #1: client: etcd member https://:2379 has no leader
>>
>> i rebooted the etcd server and my master is not able to use other one
>>
>> still able to connect from both masters using telnet to the etcd port ..
>>
>> any clue? this is weird.
>>
>>
>> > El 14 jun 2016, a las 9:28, Julio Saura  escribió:
>> >
>> > hello
>> >
>> > yes is correct .. it was the first thing i checked ..
>> >
>> > first master
>> >
>> > etcdClientInfo:
>> > ca: master.etcd-ca.crt
>> > certFile: master.etcd-client.crt
>> > keyFile: master.etcd-client.key
>> > urls:
>> >   - https://openshift-balancer01:2379
>> >   - https://openshift-balancer02:2379
>> >
>> >
>> > second master
>> >
>> > etcdClientInfo:
>> > ca: master.etcd-ca.crt
>> > certFile: master.etcd-client.crt
>> > keyFile: master.etcd-client.key
>> > urls:
>> >   - https://openshift-balancer01:2379
>> >   - https://openshift-balancer02:2379
>> >
>> > dns names resolve in both masters
>> >
>> > Best regards and thanks!
>> >
>> >
>> >> El 13 jun 2016, a las 18:45, Scott Dodson 
>> >> escribió:
>> >>
>> >> Can you verify the connection information etcdClientInfo section in
>> >> /etc/origin/master/master-config.yaml is correct?
>> >>
>> >> On Mon, Jun 13, 2016 at 11:56 AM, Julio Saura 
>> >> wrote:
>> >>> hello
>> >>>
>> >>> yes.. i have a external balancer in front of my masters for HA as doc
>> >>> says.
>> >>>
>> >>> i don’t have any balancer in front of my etcd servers for masters
>> >>> connection, it’s not necessary right? masters will try all etcd 
>> >>> availables
>> >>> it one is down right?
>> >>>
>> >>> i don’t know why but none of my masters were able to connect to the
>> >>> second etcd instance, but using telnet from their shell worked .. so it 
>> >>> was
>> >>> not a net o fw issue..
>> >>>
>> >>>
>> >>> best regards.
>> >>>
>>  El 13 jun 2016, a las 17:53, Clayton Coleman 
>>  escribió:
>> 
>>  I have not seen that particular issue.  Do you have a load balancer
>>  in
>>  between your masters and etcd?
>> 
>>  On Fri, Jun 10, 2016 at 5:55 AM, Julio Saura 
>>  wrote:
>> > hello
>> >
>> > i have an origin 3.1 installation working cool so far
>> >
>> > today one of my etcd nodes ( 1 of 2 ) crashed and i started having
>> > problems..
>> >
>> > i noticed on one of my master nodes that it was not able to connect
>> > to second etcd server and that the etcd server was not able to promote 
>> > as
>> > leader..
>> >
>> >
>> > un 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 is
>> > starting a new election at term 10048
>> > jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
>> > became candidate at term 10049
>> > jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
>> > received vote from 12c8a31c8fcae0d4 at term 10049
>> > jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
>> > [logterm: 8, index: 4600461] sent vote request to bf80ee3a26e8772c at 
>> > term
>> > 10049
>> > jun 10 11:09:56 openshift-balancer02 etcd[47218]: got unexpected
>> > response error (etcdserver: request timed out)
>> >
>> > my masters logged that they were not able to connect to the etcd
>> >
>> > er.go:218] unexpected ListAndWatch error: pkg/storage/cacher.go:161:
>> > Failed to list *extensions.Job: error #0: dial tcp X.X.X.X:2379: 
>> > connection
>> > refused
>> >
>> > so i tried a simple test, just telnet from masters to the etcd node
>> > port ..
>> >
>> > [root@openshift-master01 log]# telnet X.X.X.X 2379
>> > 

Re: weird issue with etcd

2016-06-21 Thread Julio Saura
yes

working

[root@openshift-master01 ~]# telnet X 2380
Trying ...
Connected to .
Escape character is '^]'.
^CConnection closed by foreign host.


> El 21 jun 2016, a las 13:21, Jason DeTiberus  > escribió:
> 
> Did you verify connectivity over the peering port as well (2380)?
> 
> On Jun 21, 2016 7:17 AM, "Julio Saura"  > wrote:
> hello
> 
> same problem
> 
> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]: F0621 
> 13:11:03.155246   59618 auth.go:141] error #0: dial tcp :2379: connection 
> refused ( the one i rebooted )
> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]: error 
> #1: client: etcd member https://:2379  has no leader
> 
> i rebooted the etcd server and my master is not able to use other one
> 
> still able to connect from both masters using telnet to the etcd port ..
> 
> any clue? this is weird.
> 
> 
> > El 14 jun 2016, a las 9:28, Julio Saura  > > escribió:
> >
> > hello
> >
> > yes is correct .. it was the first thing i checked ..
> >
> > first master
> >
> > etcdClientInfo:
> > ca: master.etcd-ca.crt
> > certFile: master.etcd-client.crt
> > keyFile: master.etcd-client.key
> > urls:
> >   - https://openshift-balancer01:2379 
> >   - https://openshift-balancer02:2379 
> >
> >
> > second master
> >
> > etcdClientInfo:
> > ca: master.etcd-ca.crt
> > certFile: master.etcd-client.crt
> > keyFile: master.etcd-client.key
> > urls:
> >   - https://openshift-balancer01:2379 
> >   - https://openshift-balancer02:2379 
> >
> > dns names resolve in both masters
> >
> > Best regards and thanks!
> >
> >
> >> El 13 jun 2016, a las 18:45, Scott Dodson  >> > escribió:
> >>
> >> Can you verify the connection information etcdClientInfo section in
> >> /etc/origin/master/master-config.yaml is correct?
> >>
> >> On Mon, Jun 13, 2016 at 11:56 AM, Julio Saura  >> > wrote:
> >>> hello
> >>>
> >>> yes.. i have a external balancer in front of my masters for HA as doc 
> >>> says.
> >>>
> >>> i don’t have any balancer in front of my etcd servers for masters 
> >>> connection, it’s not necessary right? masters will try all etcd 
> >>> availables it one is down right?
> >>>
> >>> i don’t know why but none of my masters were able to connect to the 
> >>> second etcd instance, but using telnet from their shell worked .. so it 
> >>> was not a net o fw issue..
> >>>
> >>>
> >>> best regards.
> >>>
>  El 13 jun 2016, a las 17:53, Clayton Coleman   > escribió:
> 
>  I have not seen that particular issue.  Do you have a load balancer in
>  between your masters and etcd?
> 
>  On Fri, Jun 10, 2016 at 5:55 AM, Julio Saura   > wrote:
> > hello
> >
> > i have an origin 3.1 installation working cool so far
> >
> > today one of my etcd nodes ( 1 of 2 ) crashed and i started having 
> > problems..
> >
> > i noticed on one of my master nodes that it was not able to connect to 
> > second etcd server and that the etcd server was not able to promote as 
> > leader..
> >
> >
> > un 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 is 
> > starting a new election at term 10048
> > jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 
> > became candidate at term 10049
> > jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 
> > received vote from 12c8a31c8fcae0d4 at term 10049
> > jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 
> > [logterm: 8, index: 4600461] sent vote request to bf80ee3a26e8772c at 
> > term 10049
> > jun 10 11:09:56 openshift-balancer02 etcd[47218]: got unexpected 
> > response error (etcdserver: request timed out)
> >
> > my masters logged that they were not able to connect to the etcd
> >
> > er.go:218] unexpected ListAndWatch error: pkg/storage/cacher.go:161: 
> > Failed to list *extensions.Job: error #0: dial tcp X.X.X.X:2379: 
> > connection refused
> >
> > so i tried a simple test, just telnet from masters to the etcd node 
> > port ..
> >
> > [root@openshift-master01 log]# telnet X.X.X.X 2379
> > Trying X.X.X.X...
> > Connected to X.X.X.X.
> > Escape character is '^]’
> >
> > so i was able to connect from masters.
> >
> > i was not able to recover my oc masters until the first etcd node 
> > rebooted .. so it seems my etcd “cluster” is not working without the 
> > first node ..

Re: weird issue with etcd

2016-06-21 Thread Jason DeTiberus
Did you verify connectivity over the peering port as well (2380)?
On Jun 21, 2016 7:17 AM, "Julio Saura"  wrote:

> hello
>
> same problem
>
> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
> F0621 13:11:03.155246   59618 auth.go:141] error #0: dial tcp :2379:
> connection refused ( the one i rebooted )
> jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
> error #1: client: etcd member https://:2379 has no leader
>
> i rebooted the etcd server and my master is not able to use other one
>
> still able to connect from both masters using telnet to the etcd port ..
>
> any clue? this is weird.
>
>
> > El 14 jun 2016, a las 9:28, Julio Saura  escribió:
> >
> > hello
> >
> > yes is correct .. it was the first thing i checked ..
> >
> > first master
> >
> > etcdClientInfo:
> > ca: master.etcd-ca.crt
> > certFile: master.etcd-client.crt
> > keyFile: master.etcd-client.key
> > urls:
> >   - https://openshift-balancer01:2379
> >   - https://openshift-balancer02:2379
> >
> >
> > second master
> >
> > etcdClientInfo:
> > ca: master.etcd-ca.crt
> > certFile: master.etcd-client.crt
> > keyFile: master.etcd-client.key
> > urls:
> >   - https://openshift-balancer01:2379
> >   - https://openshift-balancer02:2379
> >
> > dns names resolve in both masters
> >
> > Best regards and thanks!
> >
> >
> >> El 13 jun 2016, a las 18:45, Scott Dodson 
> escribió:
> >>
> >> Can you verify the connection information etcdClientInfo section in
> >> /etc/origin/master/master-config.yaml is correct?
> >>
> >> On Mon, Jun 13, 2016 at 11:56 AM, Julio Saura 
> wrote:
> >>> hello
> >>>
> >>> yes.. i have a external balancer in front of my masters for HA as doc
> says.
> >>>
> >>> i don’t have any balancer in front of my etcd servers for masters
> connection, it’s not necessary right? masters will try all etcd availables
> it one is down right?
> >>>
> >>> i don’t know why but none of my masters were able to connect to the
> second etcd instance, but using telnet from their shell worked .. so it was
> not a net o fw issue..
> >>>
> >>>
> >>> best regards.
> >>>
>  El 13 jun 2016, a las 17:53, Clayton Coleman 
> escribió:
> 
>  I have not seen that particular issue.  Do you have a load balancer in
>  between your masters and etcd?
> 
>  On Fri, Jun 10, 2016 at 5:55 AM, Julio Saura 
> wrote:
> > hello
> >
> > i have an origin 3.1 installation working cool so far
> >
> > today one of my etcd nodes ( 1 of 2 ) crashed and i started having
> problems..
> >
> > i noticed on one of my master nodes that it was not able to connect
> to second etcd server and that the etcd server was not able to promote as
> leader..
> >
> >
> > un 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 is
> starting a new election at term 10048
> > jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
> became candidate at term 10049
> > jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
> received vote from 12c8a31c8fcae0d4 at term 10049
> > jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
> [logterm: 8, index: 4600461] sent vote request to bf80ee3a26e8772c at term
> 10049
> > jun 10 11:09:56 openshift-balancer02 etcd[47218]: got unexpected
> response error (etcdserver: request timed out)
> >
> > my masters logged that they were not able to connect to the etcd
> >
> > er.go:218] unexpected ListAndWatch error: pkg/storage/cacher.go:161:
> Failed to list *extensions.Job: error #0: dial tcp X.X.X.X:2379: connection
> refused
> >
> > so i tried a simple test, just telnet from masters to the etcd node
> port ..
> >
> > [root@openshift-master01 log]# telnet X.X.X.X 2379
> > Trying X.X.X.X...
> > Connected to X.X.X.X.
> > Escape character is '^]’
> >
> > so i was able to connect from masters.
> >
> > i was not able to recover my oc masters until the first etcd node
> rebooted .. so it seems my etcd “cluster” is not working without the first
> node ..
> >
> > any clue?
> >
> > thanks
> >
> >
> > ___
> > users mailing list
> > users@lists.openshift.redhat.com
> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >>>
> >>>
> >>> ___
> >>> users mailing list
> >>> users@lists.openshift.redhat.com
> >>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >
> >
> > ___
> > users mailing list
> > users@lists.openshift.redhat.com
> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>
> ___
> users mailing list
> users@lists.openshift.redhat.com
> 

Re: weird issue with etcd

2016-06-21 Thread Julio Saura
hello

same problem

jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]: F0621 
13:11:03.155246   59618 auth.go:141] error #0: dial tcp :2379: connection 
refused ( the one i rebooted )
jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]: error 
#1: client: etcd member https://:2379 has no leader

i rebooted the etcd server and my master is not able to use other one

still able to connect from both masters using telnet to the etcd port ..

any clue? this is weird.


> El 14 jun 2016, a las 9:28, Julio Saura  escribió:
> 
> hello
> 
> yes is correct .. it was the first thing i checked ..
> 
> first master
> 
> etcdClientInfo:
> ca: master.etcd-ca.crt
> certFile: master.etcd-client.crt
> keyFile: master.etcd-client.key
> urls:
>   - https://openshift-balancer01:2379
>   - https://openshift-balancer02:2379
> 
> 
> second master
> 
> etcdClientInfo:
> ca: master.etcd-ca.crt
> certFile: master.etcd-client.crt
> keyFile: master.etcd-client.key
> urls:
>   - https://openshift-balancer01:2379
>   - https://openshift-balancer02:2379
> 
> dns names resolve in both masters
> 
> Best regards and thanks!
> 
> 
>> El 13 jun 2016, a las 18:45, Scott Dodson  escribió:
>> 
>> Can you verify the connection information etcdClientInfo section in
>> /etc/origin/master/master-config.yaml is correct?
>> 
>> On Mon, Jun 13, 2016 at 11:56 AM, Julio Saura  wrote:
>>> hello
>>> 
>>> yes.. i have a external balancer in front of my masters for HA as doc says.
>>> 
>>> i don’t have any balancer in front of my etcd servers for masters 
>>> connection, it’s not necessary right? masters will try all etcd availables 
>>> it one is down right?
>>> 
>>> i don’t know why but none of my masters were able to connect to the second 
>>> etcd instance, but using telnet from their shell worked .. so it was not a 
>>> net o fw issue..
>>> 
>>> 
>>> best regards.
>>> 
 El 13 jun 2016, a las 17:53, Clayton Coleman  
 escribió:
 
 I have not seen that particular issue.  Do you have a load balancer in
 between your masters and etcd?
 
 On Fri, Jun 10, 2016 at 5:55 AM, Julio Saura  wrote:
> hello
> 
> i have an origin 3.1 installation working cool so far
> 
> today one of my etcd nodes ( 1 of 2 ) crashed and i started having 
> problems..
> 
> i noticed on one of my master nodes that it was not able to connect to 
> second etcd server and that the etcd server was not able to promote as 
> leader..
> 
> 
> un 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 is 
> starting a new election at term 10048
> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 became 
> candidate at term 10049
> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 
> received vote from 12c8a31c8fcae0d4 at term 10049
> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 
> [logterm: 8, index: 4600461] sent vote request to bf80ee3a26e8772c at 
> term 10049
> jun 10 11:09:56 openshift-balancer02 etcd[47218]: got unexpected response 
> error (etcdserver: request timed out)
> 
> my masters logged that they were not able to connect to the etcd
> 
> er.go:218] unexpected ListAndWatch error: pkg/storage/cacher.go:161: 
> Failed to list *extensions.Job: error #0: dial tcp X.X.X.X:2379: 
> connection refused
> 
> so i tried a simple test, just telnet from masters to the etcd node port 
> ..
> 
> [root@openshift-master01 log]# telnet X.X.X.X 2379
> Trying X.X.X.X...
> Connected to X.X.X.X.
> Escape character is '^]’
> 
> so i was able to connect from masters.
> 
> i was not able to recover my oc masters until the first etcd node 
> rebooted .. so it seems my etcd “cluster” is not working without the 
> first node ..
> 
> any clue?
> 
> thanks
> 
> 
> ___
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>> 
>>> 
>>> ___
>>> users mailing list
>>> users@lists.openshift.redhat.com
>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> 
> 
> ___
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users


___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users