Re: Help debug "oc login" returning "401" / certificate issues

Jason DeTiberus Thu, 25 Feb 2016 13:33:01 -0800

On Thu, Feb 25, 2016 at 10:54 AM, Florian Daniel Otel <
[email protected]> wrote:


>
> Hello all,
>
> I have the following problems:
>
> I have a multimaster OSE setup consisting of the following:
> - A LB with "native" HA
> - Three masters (doubling as "etcd" nodes)
> - Two nodes
>
>
> All the hosts are themselves OpenStack instances (hence the ".novalocal"
> suffix). DNS is via an "/etc/hosts" propagated across, with the "lb" host
> doubling as DNS forwarder (via dnsmasq). All Internet access is via an http
> / https proxy.
>

So, if I'm understanding this correctly, then the lb host is correctly
resolving the dns for all of the *.novalocal addresses that are in use by
the cluster and all of the hosts are pre-configured to use the lb host as
the dns resolver prior to running the installation? If not, then there will
definitely be issues, since /etc/hosts is not used by deployed containers.


>
> After many attempts we finally get a setup that is somewhat working (see
> P.S. for why "somehow"). Attached is the "/etc/ansible/hosts" file.
> Installation is from the main "openshift-ansible" repo (
> https://github.com/openshift/openshift-ansible)
>
> My problem:
>
> After installation, on one master I created two users in
> "/etc/origin/htpasswd". After creation I have propagated the file to all
> the other masters. UNIX permissions to the file on all masters are "0600"
>
> However, doing an "oc login" returns a "401 Unauthorized", and I cannot
> find what the issue is, or how to debug it (no trace for it in the
> "atomic-openshift-master-api" or "atomic-openshift-master-controllers"
> logs).
>

>
> [root@az1node01 ~]# oc login
> Authentication required for https://az1lb01.mydomain.novalocal:8443
> (openshift)
> Username: reguser
> Password:
> Login failed (401 Unauthorized)
> Unauthorized
>
>
> The puzzling thing is that using the "system:node" certificates and keys
> work (in the sense I am identified as "system:anonymous"):
>

Something is definitely not right here, the user for the system:node certs
should be identified as the system:node user and not anonymous. I suspect
that there is a larger issue at play here.

It looks like the initial cluster creation may have had issues...  The
atomic-openshift-master-api logs should provide more insight into what may
have gone wrong.


>
>
> curl -v --cacert  /etc/origin/node/ca.crt --cert
> "/etc/origin/node/system:node:az1node01.mydomain.novalocal.crt" --key
> "/etc/origin/node/system:node:az1node01.mydomain.novalocal.key"
> https://az1lb01.mydomain.novalocal:8443/api/v1/namespaces
> * About to connect() to az1lb01.mydomain.novalocal port 8443 (#0)
> *   Trying 10.0.0.31...
> * Connected to az1lb01.mydomain.novalocal (10.0.0.31) port 8443 (#0)
> * Initializing NSS with certpath: sql:/etc/pki/nssdb
> *   CAfile: /etc/origin/node/ca.crt
>   CApath: none
> * NSS: client certificate not found: /etc/origin/node/system
> * SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
> * Server certificate:
> *       subject: CN=10.0.0.24
> *       start date: Feb 24 19:40:56 2016 GMT
> *       expire date: Feb 23 19:40:57 2018 GMT
> *       common name: 10.0.0.24
> *       issuer: CN=openshift-signer@1456342841
> > GET /api/v1/namespaces HTTP/1.1
> > User-Agent: curl/7.29.0
> > Host: az1lb01.mydomain.novalocal:8443
> > Accept: */*
> >
> < HTTP/1.1 403 Forbidden
> < Cache-Control: no-store
> < Content-Type: application/json
> < Date: Thu, 25 Feb 2016 14:42:41 GMT
> < Content-Length: 255
> <
> {
>   "kind": "Status",
>   "apiVersion": "v1",
>   "metadata": {},
>   "status": "Failure",
>   "message": "User \"system:anonymous\" cannot list all namespaces in the
> cluster",
>   "reason": "Forbidden",
>   "details": {
>     "kind": "namespaces"
>   },
>   "code": 403
> }
> * Connection #0 to host az1lb01.mydomain.novalocal left intact
>
> Attached is also the master configuration file for one master.
>
>
> My questions:
>
> - I had many issues in getting the installation working, mostly due to the
> Ansible installer reading the OpenStack instance metadata, and
> inconsistencies btw. that and the "hostname".
>
>   Is there any particular repo / branch of the installer that is known to
> work in this particular setup ? Any particular settings I should use in the
> Ansible hosts file ?
>
>   I suspect the certificate issues I'm encountering is because of that (in
> combination with the proxy) but I'm not sure.
>
> - Operating behind an HTTP / HTTPS proxy: Even before starting the Ansible
> installer, Docker was (properly) configured to the HTTP / HTTPS proxy
> settings, and working correctly. However, for the installer itself I found
> no way to express the "HTTP_PROXY" "HTTPS_PROXY" and, particularly, the
> "NO_PROXY" settings.  For that I'm relying on exported environment
> variables in the shell. Is there a "proper" way to do this via the
> installer itself.
>

There is an openshift-ansible PR to expose this directly (
https://github.com/openshift/openshift-ansible/pull/1385)


>
>   Post installer I have manually added those settings into
> "/etc/sysconfig/atomic-openshift-master",
> "/etc/sysconfig/atomic-openshift-master-controllers",
> "/etc/sysconfig/atomic-openshift-master-api" and, respectively for the
> nodes, "/etc/sysconfig/atomic-openshift-node", but don't know how to do
> this via the installer itself.
>
>
> - Is there an issue with the masters doubling as "etcd" nodes ?
>

No, there should not be any issues with co-locating the etcd service
alongside the masters.


>
>
> The most frustrating part  is that I have this very setup working
> perfectly fine in a public cloud environment (namely on GCE) , but with the
> (three) "etcd" hosts distinct from the masters (i.e. total of 9 hosts
> instead of 6), and with unproxied Internet access.... However, that
> installation is from a different repo branch (namely from "
> https://github.com/detiber/openshift-ansible"; from the "gceFixes" branch )
>

I *believe* all of the fixes from gceFixes have been merged into master at
this point.


>
>
> Thanks a lot for the help,
>
> Florian
>
> P.S. The weirdest case wrt certificates is when trying to check the "etcd"
> cluster:
>
>
> [root@az1master01 ~]# etcdctl --debug  -C
> https://az1master01.mydomain.novalocal:2379,
> https://az3master02.mydomain.novalocal:2379,
> https://az3master03.mydomain.novalocal:2379 --ca-file
> /etc/origin/master/ca.crt  --cert-file
> /etc/origin/master/master.etcd-client.crt     --key-file
> /etc/origin/master/master.etcd-client.key cluster-health
> Cluster-Endpoints: https://az3master02.mydomain.novalocal:2379,
> https://az1master01.mydomain.novalocal:2379,
> https://az3master03.mydomain.novalocal:2379
> cURL Command: curl -X GET
> https://az3master02.mydomain.novalocal:2379/v2/members
> cURL Command: curl -X GET
> https://az1master01.mydomain.novalocal:2379/v2/members
> cURL Command: curl -X GET
> https://az3master03.mydomain.novalocal:2379/v2/members
> cluster may be unhealthy: failed to list members
> Error:  client: etcd cluster is unavailable or misconfigured
> error #0: x509: certificate signed by unknown authority
> error #1: x509: certificate signed by unknown authority
> error #2: x509: certificate signed by unknown authority
>

You need to use the etcd ca cert here: etcdctl --debug  -C
https://az1master01.mydomain.novalocal:2379,
https://az3master02.mydomain.novalocal:2379,
https://az3master03.mydomain.novalocal:2379 --ca-file
/etc/origin/master/master.etcd-ca.crt  --cert-file
/etc/origin/master/master.etcd-client.crt     --key-file
/etc/origin/master/master.etcd-client.key cluster-health


>
>
>
> Attempting doing a direct curl to the "etcd"
>
> [root@az1master01 ~]# curl -v   --cacert /etc/origin/master/ca.crt --cert
> /etc/origin/master/master.etcd-client.crt     --key
> /etc/origin/master/master.etcd-client.key
> https://az1master01.mydomain.novalocal:2379/v2/members
> * About to connect() to az1master01.mydomain.novalocal port 2379 (#0)
> *   Trying 10.0.0.22...
> * Connected to az1master01.mydomain.novalocal (10.0.0.22) port 2379 (#0)
> * Initializing NSS with certpath: sql:/etc/pki/nssdb
> *   CAfile: /etc/origin/master/ca.crt
>   CApath: none
> * Server certificate:
> * subject: CN=az1master01.mydomain.novalocal
> * start date: Feb 24 19:38:07 2016 GMT
> * expire date: Feb 23 19:38:07 2017 GMT
> * common name: az1master01.mydomain.novalocal
> * issuer: CN=etcd-signer@1456342665
> * NSS error -8179 (SEC_ERROR_UNKNOWN_ISSUER)
> * Peer's Certificate issuer is not recognized.
> * Closing connection 0
> curl: (60) Peer's Certificate issuer is not recognized.
> More details here: http://curl.haxx.se/docs/sslcerts.html
>
> curl performs SSL certificate verification by default, using a "bundle"
>  of Certificate Authority (CA) public keys (CA certs). If the default
>  bundle file isn't adequate, you can specify an alternate file
>  using the --cacert option.
> If this HTTPS server uses a certificate signed by a CA represented in
>  the bundle, the certificate verification probably failed due to a
>  problem with the certificate (it might be expired, or the name might
>  not match the domain name in the URL).
> If you'd like to turn off curl's verification of the certificate, use
>  the -k (or --insecure) option.
> [root@az1master01 ~]#
>
>
>
>
> _______________________________________________
> users mailing list
> [email protected]
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>


-- 
Jason DeTiberus

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: Help debug "oc login" returning "401" / certificate issues

Reply via email to