On Thu, Feb 25, 2016 at 10:54 AM, Florian Daniel Otel < [email protected]> wrote:
> > Hello all, > > I have the following problems: > > I have a multimaster OSE setup consisting of the following: > - A LB with "native" HA > - Three masters (doubling as "etcd" nodes) > - Two nodes > > > All the hosts are themselves OpenStack instances (hence the ".novalocal" > suffix). DNS is via an "/etc/hosts" propagated across, with the "lb" host > doubling as DNS forwarder (via dnsmasq). All Internet access is via an http > / https proxy. > So, if I'm understanding this correctly, then the lb host is correctly resolving the dns for all of the *.novalocal addresses that are in use by the cluster and all of the hosts are pre-configured to use the lb host as the dns resolver prior to running the installation? If not, then there will definitely be issues, since /etc/hosts is not used by deployed containers. > > After many attempts we finally get a setup that is somewhat working (see > P.S. for why "somehow"). Attached is the "/etc/ansible/hosts" file. > Installation is from the main "openshift-ansible" repo ( > https://github.com/openshift/openshift-ansible) > > My problem: > > After installation, on one master I created two users in > "/etc/origin/htpasswd". After creation I have propagated the file to all > the other masters. UNIX permissions to the file on all masters are "0600" > > However, doing an "oc login" returns a "401 Unauthorized", and I cannot > find what the issue is, or how to debug it (no trace for it in the > "atomic-openshift-master-api" or "atomic-openshift-master-controllers" > logs). > > > [root@az1node01 ~]# oc login > Authentication required for https://az1lb01.mydomain.novalocal:8443 > (openshift) > Username: reguser > Password: > Login failed (401 Unauthorized) > Unauthorized > > > The puzzling thing is that using the "system:node" certificates and keys > work (in the sense I am identified as "system:anonymous"): > Something is definitely not right here, the user for the system:node certs should be identified as the system:node user and not anonymous. I suspect that there is a larger issue at play here. It looks like the initial cluster creation may have had issues... The atomic-openshift-master-api logs should provide more insight into what may have gone wrong. > > > curl -v --cacert /etc/origin/node/ca.crt --cert > "/etc/origin/node/system:node:az1node01.mydomain.novalocal.crt" --key > "/etc/origin/node/system:node:az1node01.mydomain.novalocal.key" > https://az1lb01.mydomain.novalocal:8443/api/v1/namespaces > * About to connect() to az1lb01.mydomain.novalocal port 8443 (#0) > * Trying 10.0.0.31... > * Connected to az1lb01.mydomain.novalocal (10.0.0.31) port 8443 (#0) > * Initializing NSS with certpath: sql:/etc/pki/nssdb > * CAfile: /etc/origin/node/ca.crt > CApath: none > * NSS: client certificate not found: /etc/origin/node/system > * SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 > * Server certificate: > * subject: CN=10.0.0.24 > * start date: Feb 24 19:40:56 2016 GMT > * expire date: Feb 23 19:40:57 2018 GMT > * common name: 10.0.0.24 > * issuer: CN=openshift-signer@1456342841 > > GET /api/v1/namespaces HTTP/1.1 > > User-Agent: curl/7.29.0 > > Host: az1lb01.mydomain.novalocal:8443 > > Accept: */* > > > < HTTP/1.1 403 Forbidden > < Cache-Control: no-store > < Content-Type: application/json > < Date: Thu, 25 Feb 2016 14:42:41 GMT > < Content-Length: 255 > < > { > "kind": "Status", > "apiVersion": "v1", > "metadata": {}, > "status": "Failure", > "message": "User \"system:anonymous\" cannot list all namespaces in the > cluster", > "reason": "Forbidden", > "details": { > "kind": "namespaces" > }, > "code": 403 > } > * Connection #0 to host az1lb01.mydomain.novalocal left intact > > Attached is also the master configuration file for one master. > > > My questions: > > - I had many issues in getting the installation working, mostly due to the > Ansible installer reading the OpenStack instance metadata, and > inconsistencies btw. that and the "hostname". > > Is there any particular repo / branch of the installer that is known to > work in this particular setup ? Any particular settings I should use in the > Ansible hosts file ? > > I suspect the certificate issues I'm encountering is because of that (in > combination with the proxy) but I'm not sure. > > - Operating behind an HTTP / HTTPS proxy: Even before starting the Ansible > installer, Docker was (properly) configured to the HTTP / HTTPS proxy > settings, and working correctly. However, for the installer itself I found > no way to express the "HTTP_PROXY" "HTTPS_PROXY" and, particularly, the > "NO_PROXY" settings. For that I'm relying on exported environment > variables in the shell. Is there a "proper" way to do this via the > installer itself. > There is an openshift-ansible PR to expose this directly ( https://github.com/openshift/openshift-ansible/pull/1385) > > Post installer I have manually added those settings into > "/etc/sysconfig/atomic-openshift-master", > "/etc/sysconfig/atomic-openshift-master-controllers", > "/etc/sysconfig/atomic-openshift-master-api" and, respectively for the > nodes, "/etc/sysconfig/atomic-openshift-node", but don't know how to do > this via the installer itself. > > > - Is there an issue with the masters doubling as "etcd" nodes ? > No, there should not be any issues with co-locating the etcd service alongside the masters. > > > The most frustrating part is that I have this very setup working > perfectly fine in a public cloud environment (namely on GCE) , but with the > (three) "etcd" hosts distinct from the masters (i.e. total of 9 hosts > instead of 6), and with unproxied Internet access.... However, that > installation is from a different repo branch (namely from " > https://github.com/detiber/openshift-ansible" from the "gceFixes" branch ) > I *believe* all of the fixes from gceFixes have been merged into master at this point. > > > Thanks a lot for the help, > > Florian > > P.S. The weirdest case wrt certificates is when trying to check the "etcd" > cluster: > > > [root@az1master01 ~]# etcdctl --debug -C > https://az1master01.mydomain.novalocal:2379, > https://az3master02.mydomain.novalocal:2379, > https://az3master03.mydomain.novalocal:2379 --ca-file > /etc/origin/master/ca.crt --cert-file > /etc/origin/master/master.etcd-client.crt --key-file > /etc/origin/master/master.etcd-client.key cluster-health > Cluster-Endpoints: https://az3master02.mydomain.novalocal:2379, > https://az1master01.mydomain.novalocal:2379, > https://az3master03.mydomain.novalocal:2379 > cURL Command: curl -X GET > https://az3master02.mydomain.novalocal:2379/v2/members > cURL Command: curl -X GET > https://az1master01.mydomain.novalocal:2379/v2/members > cURL Command: curl -X GET > https://az3master03.mydomain.novalocal:2379/v2/members > cluster may be unhealthy: failed to list members > Error: client: etcd cluster is unavailable or misconfigured > error #0: x509: certificate signed by unknown authority > error #1: x509: certificate signed by unknown authority > error #2: x509: certificate signed by unknown authority > You need to use the etcd ca cert here: etcdctl --debug -C https://az1master01.mydomain.novalocal:2379, https://az3master02.mydomain.novalocal:2379, https://az3master03.mydomain.novalocal:2379 --ca-file /etc/origin/master/master.etcd-ca.crt --cert-file /etc/origin/master/master.etcd-client.crt --key-file /etc/origin/master/master.etcd-client.key cluster-health > > > > Attempting doing a direct curl to the "etcd" > > [root@az1master01 ~]# curl -v --cacert /etc/origin/master/ca.crt --cert > /etc/origin/master/master.etcd-client.crt --key > /etc/origin/master/master.etcd-client.key > https://az1master01.mydomain.novalocal:2379/v2/members > * About to connect() to az1master01.mydomain.novalocal port 2379 (#0) > * Trying 10.0.0.22... > * Connected to az1master01.mydomain.novalocal (10.0.0.22) port 2379 (#0) > * Initializing NSS with certpath: sql:/etc/pki/nssdb > * CAfile: /etc/origin/master/ca.crt > CApath: none > * Server certificate: > * subject: CN=az1master01.mydomain.novalocal > * start date: Feb 24 19:38:07 2016 GMT > * expire date: Feb 23 19:38:07 2017 GMT > * common name: az1master01.mydomain.novalocal > * issuer: CN=etcd-signer@1456342665 > * NSS error -8179 (SEC_ERROR_UNKNOWN_ISSUER) > * Peer's Certificate issuer is not recognized. > * Closing connection 0 > curl: (60) Peer's Certificate issuer is not recognized. > More details here: http://curl.haxx.se/docs/sslcerts.html > > curl performs SSL certificate verification by default, using a "bundle" > of Certificate Authority (CA) public keys (CA certs). If the default > bundle file isn't adequate, you can specify an alternate file > using the --cacert option. > If this HTTPS server uses a certificate signed by a CA represented in > the bundle, the certificate verification probably failed due to a > problem with the certificate (it might be expired, or the name might > not match the domain name in the URL). > If you'd like to turn off curl's verification of the certificate, use > the -k (or --insecure) option. > [root@az1master01 ~]# > > > > > _______________________________________________ > users mailing list > [email protected] > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > > -- Jason DeTiberus
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
