Re: TSB fails to start

2018-03-14 Thread Tim Dudgeon

A little more on this.
One the nodes that are not working the file 
/etc/cni/net.d/80-openshift-network.conf is not present.

This seems to cause errors like this in the origin-node service:

Mar 14 18:21:45 zzz-infra.openstacklocal origin-node[17833]: W0314 
18:21:45.711715   17833 cni.go:189] Unable to update cni config: No 
networks found in /etc/cni/net.d


Where in the installation process does the 80-openshift-network.conf 
file get created?
I don't see anything in the ansible installer logs suggesting anything 
has gone wrong.




On 13/03/18 17:02, Tim Dudgeon wrote:


This is still troubling me. I would welcome any input on this.

When I run an ansible install (using Origin 3.7.1 on Centos7 nodes) 
the DNS setup on some nodes seems to randomly get messed up. For 
instance I've just run a setup with 1 master, 1 infra and 2 identical 
worker nodes.


During the installation one of the worker nodes starts responding very 
slowly. The other is fine.

Looking deeper, on the slow responding one I see a DNS setup like this:

[centos@xxx-node-001 ~]$ sudo netstat -tunlp | grep tcp | grep :53 | 
grep -v tcp6
tcp    0  0 10.0.0.20:53 0.0.0.0:*   LISTEN  
14727/dnsmasq
tcp    0  0 172.17.0.1:53 0.0.0.0:*   LISTEN  
14727/dnsmasq

[centos@xxx-node-001 ~]$ host orndev-bastion-002
;; connection timed out; trying next origin
orndev-bastion-002.openstacklocal has address 10.0.0.9


Whilst on the good one it looks like this:

[centos@xxx-node-002 ~]$ sudo netstat -tunlp | grep tcp | grep :53 | 
grep -v tcp6
tcp    0  0 127.0.0.1:53 0.0.0.0:*   LISTEN  
17231/openshift
tcp    0  0 10.129.0.1:53 0.0.0.0:*   LISTEN  
14563/dnsmasq
tcp    0  0 10.0.0.22:53 0.0.0.0:*   LISTEN  
14563/dnsmasq
tcp    0  0 172.17.0.1:53 0.0.0.0:*   LISTEN  
14563/dnsmasq

[centos@xxx-node-002 ~]$ host orndev-bastion-002
orndev-bastion-002.openstacklocal has address 10.0.0.9
Notice how 2 DNS listeners are not present, and how this causes the 
DNS lookup to timeout locally before falling back to an upstream server.


Getting into this state seems to be a random event.

Any thoughts?



On 01/03/18 14:30, Tim Dudgeon wrote:


Yes, I think it is related to DNS.

On a similar, but working, OpenStack environment ` netstat -tunlp | 
grep ...` shows this:


tcp    0  0 127.0.0.1:53 0.0.0.0:*   LISTEN  
16957/openshift
tcp    0  0 10.128.0.1:53 0.0.0.0:*   LISTEN  
16248/dnsmasq
tcp    0  0 10.0.0.5:53 0.0.0.0:*   LISTEN  
16248/dnsmasq
tcp    0  0 172.17.0.1:53 0.0.0.0:*   LISTEN  
16248/dnsmasq
tcp    0  0 0.0.0.0:8053 0.0.0.0:*   LISTEN  
12270/openshift


On the environment where the TSB is failing to start I'm seeing:

tcp    0  0 127.0.0.1:53 0.0.0.0:*   LISTEN  
19067/openshift
tcp    0  0 10.129.0.1:53 0.0.0.0:*   LISTEN  
16062/dnsmasq
tcp    0  0 172.17.0.1:53 0.0.0.0:*   LISTEN  
16062/dnsmasq
tcp    0  0 0.0.0.0:8053 0.0.0.0:*   LISTEN  
11628/openshift


Notice that inf the first case dnsmasq is listening on the machine's 
IP address (line 3) but in the second case  this is missing.


Both environments have been created with the openshift-ansible 
playbooks using an approach that is as equivalent as is possible.
The contents of /etc/dnsmasq.d/ on the two systems also seem to be 
equivalent.


Any thoughts?



On 28/02/18 18:50, Nobuhiro Sue wrote:

Tim,

It seems to be DNS issue. I guess your environment is on OpenStack, 
so please check resolver (lookup / reverse lookup).

You can see how DNS works on OpenShift 3.6 or above:
https://blog.openshift.com/dns-changes-red-hat-openshift-container-platform-3-6/

2018-03-01 0:06 GMT+09:00 Tim Dudgeon >:


Hi

I'm having problems getting an Origin cluster running, using the
ansible playbooks.
It fails at this point:

TASK [template_service_broker : Verify that TSB is running]

**
FAILED - RETRYING: Verify that TSB is running (120 retries left).
FAILED - RETRYING: Verify that TSB is running (119 retries left).

FAILED - RETRYING: Verify that TSB is running (1 retries left).
fatal: [master-01.novalocal]: FAILED! => {"attempts": 120,
"changed": false, "cmd": ["curl", "-k",
"https://apiserver.openshift-template-service-broker.svc/healthz
"],
"delta": "0:00:01.529402", "end": "2018-02-28 14:49:30.190842",
"msg": "non-zero return code", "rc": 7, "start": "2018-02-28
14:49:28.661440", "stderr": "  % Total    % Received % Xferd

CentOS PaaS SIG meeting (2018-03-14) [DST Time reminder]

2018-03-14 Thread Troy Dawson
Hello,
It's time for our weekly PaaS SIG sync-up meeting

Time: 1700 UTC - Wedensdays (date -d "1700 UTC")
Date: Today Wedensday, 14 March 2018
Where: IRC- Freenode - #centos-devel

For those in the United States, remember that we are using UTC time,
and so the time is an hour later than it was last week.

Agenda:
- OpenShift Current Status
-- rpms
-- Automated rpm building and Automated testing
-- Multi-arch
-- Documentation
- Upcomming Committee Member Changes
- Open Floor

Minutes from last meeting:
https://www.centos.org/minutes/2018/March/centos-devel.2018-03-07-17.02.log.html

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: docker 1.13.1 breaking 3.7.0 installs

2018-03-14 Thread Walters, Todd
HI Alfredo,

I set this by installing origin-docker-excluder with yum.  Excluder adds that 
line below to /etc/yum.conf. You can then edit the line, manually if you want, 
and put in 1.13 to be excluded, then yum will not try to install it.

Of course, I also did work around by downgrade ‘ansible -m shell -a 'yum 
downgrade docker docker-common docker-client -y' OSEv3’  of docker, and then 
setting this on install openshift_disable_check=package_version.  Obviously not 
the best solution, but it works.

Thanks,
Todd

From: Alfredo Palhares 
Date: Wednesday, March 14, 2018 at 5:05 AM
To: "Walters, Todd" 
Cc: "users@lists.openshift.redhat.com" 
Subject: Re: docker 1.13.1 breaking 3.7.0 installs

We’re seeing this same issue on new install of 3.7 or on upgrade from 3.6 to 
3.7.  We tried excluder for docker, but version in centos 3.7 repo only 
excludes to 1.14. You can manually add 1.13 to this list.

exclude= docker*1.20*  docker*1.19*  docker*1.18*  docker*1.17*  docker*1.16*  
docker*1.15*  docker*1.14*

Hmm, how did you set this?  On the inventory file?

Regards,
Alfredo Palhares

On Tue, Mar 13, 2018 at 3:11 PM, Walters, Todd 
> wrote:


On 3/12/18, 11:04 PM, 
"users-boun...@lists.openshift.redhat.com
 on behalf of 
users-requ...@lists.openshift.redhat.com"
 

 on behalf of 
users-requ...@lists.openshift.redhat.com>
 wrote:


   4. Docker 1.13.1 breaking 3.7.0 installs (Brigman, Larry)


Message: 4
Date: Tue, 13 Mar 2018 03:59:33 +
From: "Brigman, Larry" 
>
To: 
"users@lists.openshift.redhat.com"
>
Subject: Docker 1.13.1 breaking 3.7.0 installs
Message-ID:

<3b94f360ba2c1448b926eadf68c6be1c01f5707...@sdcexmbx2.arrs.arrisi.com>
Content-Type: text/plain; charset="us-ascii"

Looks like CentOS has released an update to Docker.  The playbooks want to 
say that it should use it but there is another test that says it cannot use 
anything > 1.12.

None of the variables allow over-riding this setting using the rpm packages.
The only way I found to get this working is to modify
roles/openshift_health_checker/openshift_checks/package_version.py
Adding this line to the openshift_to_docker_version:
(3, 7): "1.13",

  --

We’re seeing this same issue on new install of 3.7 or on upgrade from 3.6 to 
3.7.  We tried excluder for docker, but version in centos 3.7 repo only 
excludes to 1.14. You can manually add 1.13 to this list.

exclude= docker*1.20*  docker*1.19*  docker*1.18*  docker*1.17*  docker*1.16*  
docker*1.15*  docker*1.14*

We’ve downgraded docker to 1.12 and set this during install has temp 
workaround:   openshift_disable_check=package_version

Thanks,

Todd



The information contained in this message, and any attachments thereto,
is intended solely for the use of the addressee(s) and may contain
confidential and/or privileged material. Any review, retransmission,
dissemination, copying, or other use of the transmitted information is
prohibited. If you received this in error, please contact the sender
and delete the material from any computer. 
UNIGROUP.COM



___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users



The information contained in this message, and any attachments thereto,
is intended solely for the use of the addressee(s) and may contain
confidential and/or privileged material. Any review, retransmission,
dissemination, copying, or other use of the transmitted information is
prohibited. 

Re: Can the Origin Ansible Playbook stop on "Restart node" **fatal** errors?

2018-03-14 Thread Joel Pearson
You could edit the
openshift-ansible\playbooks\common\openshift-node\restart.yml and add:

max_fail_percentage: 0

under

serial: "{{ openshift_restart_nodes_serial | default(1) }}"

That, in theory, should make it fail straight away.

On Wed, Mar 14, 2018 at 9:46 PM Alan Christie <
achris...@informaticsmatters.com> wrote:

> Hi,
>
> I’ve been running the Ansible release-3.7 branch playbook and occasionally
> I get errors restarting nodes. I’m not looking for help on why my nodes are
> not restarting but I am curious as to why the playbook continues when there
> are fatal errors that eventually lead to a failure some 30 minutes or so
> later? Especially annoying if you happen a) not to be looking at the screen
> at the time of the original failure or b) running the installation inside
> another IaC framework.
>
> Is there an option to “stop on fatal” I’m missing by chance?
>
> Here’s a typical failure at (in my case) 21 minutes in…
>
>
> *RUNNING HANDLER [openshift_node : restart
> node] 
> ***Wednesday
> 14 March 2018  10:12:44 + (0:00:00.081)   0:21:47.968 ***
> skipping: [os-master-1]
> skipping: [os-node-001]
> FAILED - RETRYING: restart node (3 retries left).
> FAILED - RETRYING: restart node (3 retries left).
> FAILED - RETRYING: restart node (2 retries left).
> FAILED - RETRYING: restart node (2 retries left).
> FAILED - RETRYING: restart node (1 retries left).
> FAILED - RETRYING: restart node (1 retries left).
>
>
> *fatal: [os-infra-1]: FAILED! => {"attempts": 3, "changed": false, "msg":
> "Unable to restart service origin-node: Job for origin-node.service failed
> because the control process exited with error code. See \"systemctl status
> origin-node.service\" and \"journalctl -xe\" for details.\n"}fatal:
> [os-node-002]: FAILED! => {"attempts": 3, "changed": false, "msg": "Unable
> to restart service origin-node: Job for origin-node.service failed because
> the control process exited with error code. See \"systemctl status
> origin-node.service\" and \"journalctl -xe\" for details.\n"}*
> And the roll-out finally "gives up the ghost" (in my case) after a further
> 30 minutes...
>
> TASK [debug]
> *
> Wednesday 14 March 2018  10:42:20 + (0:00:00.117)   0:51:23.829
> ***
> skipping: [os-master-1]
> to retry, use: --limit
> @/home/centos/abc/orchestrator/openshift/openshift-ansible/playbooks/byo/config.retry
>
> PLAY RECAP
> ***
> localhost  : ok=13   changed=0unreachable=0
>   failed=0
> *os-infra-1 : ok=182  changed=70   unreachable=0
>   failed=1   *
> os-master-1: ok=539  changed=210  unreachable=0
>   failed=0
> os-node-001: ok=188  changed=65   unreachable=0
>   failed=0
> *os-node-002: ok=165  changed=61   unreachable=0
>   failed=1*
>
> Alan Christie
>
>
>
>
> ___
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Re: Re: Check if template service broker is running

2018-03-14 Thread marc . schlegel
I am trying another fresh install just now, and my ansible script is 
hanging for 15 minutes at 
TASK [openshift_service_catalog : wait for api server to be ready]

This was the same the last times I tried.

I did a minor adjustment to the ansible-script by adding the following 
options
openshift_enable_service_catalog=true
openshift_template_service_broker_namespaces=['openshift']

Is it possible that my DNS is not working? Is the api service listening on 
a special DNS which I need to add to my domain? 
Those are some other settings
openshift_master_cluster_public_hostname="openshift.vnet.de"
openshift_master_default_subdomain=apps.vnet.de

The DNS is setup so that everything ending with vnet.de points to node-1 
(infra node) while openshift.vnet.de points to my master. The hostnames 
are also mapped explicitly.





Von:marc.schle...@sdv-it.de
An: users@lists.openshift.redhat.com
Datum:  14.03.2018 09:13
Betreff:Re: Re: Check if template service broker is running
Gesendet von:   users-boun...@lists.openshift.redhat.com



Thats what I get from my console. 


Logged into "https://openshift.vnet.de:8443; as "system:admin" using 
existing 
credentials.

You have access to the following projects and can switch between them with 
'oc 
project ':

 * default
   kube-public
   kube-system
   logging
   management-infra
   openshift
   openshift-infra
   openshift-node

Using project "default".
[root@master ~]# oc describe daemonset -n 
openshift-template-service-broker
[root@master ~]# oc get events -n openshift-template-service-broker
No resources found.
[root@master ~]# 
My cluster-setup looks like this 
- master 
- node-1 (label "region" : "infra" for infrastructure which is used in the 
ansible for the docker-registry and the router) 
- node-2 (label "region" : "primary" for deployments)

The only Ansible adjustments I made, apart from the necessary where those 
openshift_hostet_router_selector='region=infra' 
openshift_hostet_registry_selector='region=infra' 

regards 
Marc 



Von:Sam Padgett  
An:marc.schle...@sdv-it.de 
Kopie:users  
Datum:09.03.2018 18:24 
Betreff:Re: Re: Check if template service broker is running 



Do you see any obvious problems looking at...? 

$ oc describe daemonset -n openshift-template-service-broker 
$ oc get events -n openshift-template-service-broker 

(I'm assuming you haven't set `template_service_broker_install` to false 
in your inventory.) 

On Fri, Mar 9, 2018 at 2:30 AM,  wrote: 
The template-service-broker is indeed not running. 

[root@master ~]# oc get pods -n kube-service-catalog
NAME   READY STATUSRESTARTS   AGE
apiserver-w8p771/1   Running   2  6d
controller-manager-cmmcx   1/1   Running   5  6d

[root@master ~]# oc get pods -n openshift-template-service-broker
No resources found. 

The Ansible install finished with a warning which is probably the reason. 
Unfortunately a reinstall always ends with the same result (no 
template-service-broker) 

Can I manually install one and how? 


Von:Sam Padgett  
An:marc.schle...@sdv-it.de 
Kopie:users  
Datum:08.03.2018 15:11 
Betreff:Re: Check if template service broker is running 



Starting in 3.7, service catalog and template service broker is enabled by 
default when installing using openshift-ansible. You can check if things 
are running with: 

$ oc get pods -n kube-service-catalog 
NAME READY STATUSRESTARTS   AGE 
apiserver-858dcddcdf-f58mv   2/2   Running   0  15m 
controller-manager-645f5dbbd-jz8ll   1/1   Running   0  15m 

$ oc get pods -n openshift-template-service-broker 
NAME  READY STATUSRESTARTS   AGE 
apiserver-4cq6q   1/1   Running   0  15m 

If template service broker is installed, but not running, that would 
explain why items are missing. 

On Thu, Mar 8, 2018 at 3:05 AM,  wrote: 
Hello everyone 

I am having trouble with the templates defined in the default 
image-streams. 

Checking which imagestreams and templates are installed via oc get lists 
everthing I am expecting. 
Unfortunately the webconsole is only showing 8 items (Jenkins for example 
is missing). 

I've got some help from the Openshift Google Groups which says that the 
templates service broker might not be running[1] 

How can I check if this service is running? 



[root@master ~]# oc get is -n openshift
NAME DOCKER REPO   TAGS UPDATED
dotnet   docker-registry.default.svc:5000/openshift/dotnet  latest
,2.0   About an hour ago
dotnet-runtime   docker-registry.default.svc:5000/openshift/dotnet-runtime 
  latest,2.0   About an hour ago
httpd  

Re: docker 1.13.1 breaking 3.7.0 installs

2018-03-14 Thread Alfredo Palhares
>
> We’re seeing this same issue on new install of 3.7 or on upgrade from 3.6
> to 3.7.  We tried excluder for docker, but version in centos 3.7 repo only
> excludes to 1.14. You can manually add 1.13 to this list.
>
> exclude= docker*1.20*  docker*1.19*  docker*1.18*  docker*1.17*
> docker*1.16*  docker*1.15*  docker*1.14*
>

Hmm, how did you set this?  On the inventory file?

Regards,
Alfredo Palhares

On Tue, Mar 13, 2018 at 3:11 PM, Walters, Todd 
wrote:

>
>
> On 3/12/18, 11:04 PM, "users-boun...@lists.openshift.redhat.com on behalf
> of users-requ...@lists.openshift.redhat.com"  openshift.redhat.com on behalf of users-requ...@lists.openshift.redhat.com>
> wrote:
>
>
>4. Docker 1.13.1 breaking 3.7.0 installs (Brigman, Larry)
>
>
> Message: 4
> Date: Tue, 13 Mar 2018 03:59:33 +
> From: "Brigman, Larry" 
> To: "users@lists.openshift.redhat.com"
> 
> Subject: Docker 1.13.1 breaking 3.7.0 installs
> Message-ID:
> <3b94f360ba2c1448b926eadf68c6be1c01f5707...@sdcexmbx2.arrs.arrisi.com>
> Content-Type: text/plain; charset="us-ascii"
>
> Looks like CentOS has released an update to Docker.  The playbooks
> want to say that it should use it but there is another test that says it
> cannot use anything > 1.12.
>
> None of the variables allow over-riding this setting using the rpm
> packages.
> The only way I found to get this working is to modify
> roles/openshift_health_checker/openshift_checks/package_version.py
> Adding this line to the openshift_to_docker_version:
> (3, 7): "1.13",
>
>   --
>
> We’re seeing this same issue on new install of 3.7 or on upgrade from 3.6
> to 3.7.  We tried excluder for docker, but version in centos 3.7 repo only
> excludes to 1.14. You can manually add 1.13 to this list.
>
> exclude= docker*1.20*  docker*1.19*  docker*1.18*  docker*1.17*
> docker*1.16*  docker*1.15*  docker*1.14*
>
> We’ve downgraded docker to 1.12 and set this during install has temp
> workaround:   openshift_disable_check=package_version
>
> Thanks,
>
> Todd
>
>
> 
> The information contained in this message, and any attachments thereto,
> is intended solely for the use of the addressee(s) and may contain
> confidential and/or privileged material. Any review, retransmission,
> dissemination, copying, or other use of the transmitted information is
> prohibited. If you received this in error, please contact the sender
> and delete the material from any computer. UNIGROUP.COM
> 
>
>
> ___
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Re: Check if template service broker is running

2018-03-14 Thread marc . schlegel
Thats what I get from my console. 


Logged into "https://openshift.vnet.de:8443; as "system:admin" using 
existing 
credentials.

You have access to the following projects and can switch between them with 
'oc 
project ':

  * default
kube-public
kube-system
logging
management-infra
openshift
openshift-infra
openshift-node

Using project "default".
[root@master ~]# oc describe daemonset -n 
openshift-template-service-broker
[root@master ~]# oc get events -n openshift-template-service-broker
No resources found.
[root@master ~]# 

My cluster-setup looks like this
- master
- node-1 (label "region" : "infra" for infrastructure which is used in the 
ansible for the docker-registry and the router)
- node-2 (label "region" : "primary" for deployments)

The only Ansible adjustments I made, apart from the necessary where those
openshift_hostet_router_selector='region=infra'
openshift_hostet_registry_selector='region=infra'

regards
Marc




Von:Sam Padgett 
An: marc.schle...@sdv-it.de
Kopie:  users 
Datum:  09.03.2018 18:24
Betreff:Re: Re: Check if template service broker is running



Do you see any obvious problems looking at...?

$ oc describe daemonset -n openshift-template-service-broker
$ oc get events -n openshift-template-service-broker

(I'm assuming you haven't set `template_service_broker_install` to false 
in your inventory.)

On Fri, Mar 9, 2018 at 2:30 AM,  wrote:
The template-service-broker is indeed not running. 

[root@master ~]# oc get pods -n kube-service-catalog
NAME   READY STATUSRESTARTS   AGE
apiserver-w8p771/1   Running   2  6d
controller-manager-cmmcx   1/1   Running   5  6d

[root@master ~]# oc get pods -n openshift-template-service-broker
No resources found. 

The Ansible install finished with a warning which is probably the reason. 
Unfortunately a reinstall always ends with the same result (no 
template-service-broker) 

Can I manually install one and how? 


Von:Sam Padgett  
An:marc.schle...@sdv-it.de 
Kopie:users  
Datum:08.03.2018 15:11 
Betreff:Re: Check if template service broker is running 



Starting in 3.7, service catalog and template service broker is enabled by 
default when installing using openshift-ansible. You can check if things 
are running with: 

$ oc get pods -n kube-service-catalog 
NAME READY STATUSRESTARTS   AGE 
apiserver-858dcddcdf-f58mv   2/2   Running   0  15m 
controller-manager-645f5dbbd-jz8ll   1/1   Running   0  15m 

$ oc get pods -n openshift-template-service-broker 
NAME  READY STATUSRESTARTS   AGE 
apiserver-4cq6q   1/1   Running   0  15m 

If template service broker is installed, but not running, that would 
explain why items are missing. 

On Thu, Mar 8, 2018 at 3:05 AM,  wrote: 
Hello everyone 

I am having trouble with the templates defined in the default 
image-streams. 

Checking which imagestreams and templates are installed via oc get lists 
everthing I am expecting. 
Unfortunately the webconsole is only showing 8 items (Jenkins for example 
is missing). 

I've got some help from the Openshift Google Groups which says that the 
templates service broker might not be running[1] 

How can I check if this service is running? 



[root@master ~]# oc get is -n openshift
NAME DOCKER REPO   
  TAGS UPDATED
dotnet   docker-registry.default.svc:5000/openshift/dotnet 
  latest,2.0   About an hour ago
dotnet-runtime   docker-registry.default.svc:5000/openshift/dotnet-runtime 
  latest,2.0   About an hour ago
httpddocker-registry.default.svc:5000/openshift/httpd 
   latest,2.4   About an hour ago
jenkins  docker-registry.default.svc:5000/openshift/jenkins   
   1,2,latest   About an hour ago
mariadb  docker-registry.default.svc:5000/openshift/mariadb   
   10.1,latest  About an hour ago
mongodb  docker-registry.default.svc:5000/openshift/mongodb   
   3.2,2.6,2.4 + 1 more...  About an hour ago
mysqldocker-registry.default.svc:5000/openshift/mysql 
   5.6,5.5,latest + 1 more...   About an hour ago
nodejs   docker-registry.default.svc:5000/openshift/nodejs 
  latest,0.10,4 + 1 more...About an hour ago
perl docker-registry.default.svc:5000/openshift/perl   
  5.24,5.20,5.16 + 1 more...   About an hour ago
php  docker-registry.default.svc:5000/openshift/php   
   latest,7.0,5.6 + 1 more...   About an hour ago
postgresql