Re: Help debug "oc login" returning "401" / certificate issues

Florian Daniel Otel Thu, 25 Feb 2016 14:12:47 -0800

Hi Jason,

Kindest thanks for trying to help.


In order

1) Indeed, the "lb" host is configured (via dnsmasq) as a DNS forwarder,
has the correct "/etc/hosts" (which is propagated to all the other hosts in
the cluster), and all hosts have an entry pointing to it in the
"/etc/resolv.conf"

2) A bit puzzled wrt "system:node" vs "system:anonymous"....

I've just test the corresponding curl call on another system where
everything work as expected (at least so far...)  and the response I get
back from a GET to " /api/v1/namespaces" still refers to "system:anonymous"
, and not "system:node"

Also, to make things even more weird, if I copy the node "kubeconfig" in
the ".kube/config" I am identified accordingly (i.e. as "system:node") when
doing an "oc whoami"


3) Thanks for pointing out that specifying "HTTP_PROXY" / "HTTPS_PROXY" and
resp "NO_PROXY" is not yet possible via the Ansible installer.

My  remaining question is: Is there any way to debug the authentication
process / why the "oc login" with "httpasswd" back end doesn't work ?


Thanks again,

/Florian



On Thu, Feb 25, 2016 at 10:30 PM, Jason DeTiberus <[email protected]>
wrote:

>
>
> On Thu, Feb 25, 2016 at 10:54 AM, Florian Daniel Otel <
> [email protected]> wrote:
>
>>
>> Hello all,
>>
>> I have the following problems:
>>
>> I have a multimaster OSE setup consisting of the following:
>> - A LB with "native" HA
>> - Three masters (doubling as "etcd" nodes)
>> - Two nodes
>>
>>
>> All the hosts are themselves OpenStack instances (hence the ".novalocal"
>> suffix). DNS is via an "/etc/hosts" propagated across, with the "lb" host
>> doubling as DNS forwarder (via dnsmasq). All Internet access is via an http
>> / https proxy.
>>
>
> So, if I'm understanding this correctly, then the lb host is correctly
> resolving the dns for all of the *.novalocal addresses that are in use by
> the cluster and all of the hosts are pre-configured to use the lb host as
> the dns resolver prior to running the installation? If not, then there will
> definitely be issues, since /etc/hosts is not used by deployed containers.
>
>
>>
>> After many attempts we finally get a setup that is somewhat working (see
>> P.S. for why "somehow"). Attached is the "/etc/ansible/hosts" file.
>> Installation is from the main "openshift-ansible" repo (
>> https://github.com/openshift/openshift-ansible)
>>
>> My problem:
>>
>> After installation, on one master I created two users in
>> "/etc/origin/htpasswd". After creation I have propagated the file to all
>> the other masters. UNIX permissions to the file on all masters are "0600"
>>
>> However, doing an "oc login" returns a "401 Unauthorized", and I cannot
>> find what the issue is, or how to debug it (no trace for it in the
>> "atomic-openshift-master-api" or "atomic-openshift-master-controllers"
>> logs).
>>
>
>>
>> [root@az1node01 ~]# oc login
>> Authentication required for https://az1lb01.mydomain.novalocal:8443
>> (openshift)
>> Username: reguser
>> Password:
>> Login failed (401 Unauthorized)
>> Unauthorized
>>
>>
>> The puzzling thing is that using the "system:node" certificates and keys
>> work (in the sense I am identified as "system:anonymous"):
>>
>
> Something is definitely not right here, the user for the system:node certs
> should be identified as the system:node user and not anonymous. I suspect
> that there is a larger issue at play here.
>
> It looks like the initial cluster creation may have had issues...  The
> atomic-openshift-master-api logs should provide more insight into what may
> have gone wrong.
>
>
>>
>>
>> curl -v --cacert  /etc/origin/node/ca.crt --cert
>> "/etc/origin/node/system:node:az1node01.mydomain.novalocal.crt" --key
>> "/etc/origin/node/system:node:az1node01.mydomain.novalocal.key"
>> https://az1lb01.mydomain.novalocal:8443/api/v1/namespaces
>> * About to connect() to az1lb01.mydomain.novalocal port 8443 (#0)
>> *   Trying 10.0.0.31...
>> * Connected to az1lb01.mydomain.novalocal (10.0.0.31) port 8443 (#0)
>> * Initializing NSS with certpath: sql:/etc/pki/nssdb
>> *   CAfile: /etc/origin/node/ca.crt
>>   CApath: none
>> * NSS: client certificate not found: /etc/origin/node/system
>> * SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
>> * Server certificate:
>> *       subject: CN=10.0.0.24
>> *       start date: Feb 24 19:40:56 2016 GMT
>> *       expire date: Feb 23 19:40:57 2018 GMT
>> *       common name: 10.0.0.24
>> *       issuer: CN=openshift-signer@1456342841
>> > GET /api/v1/namespaces HTTP/1.1
>> > User-Agent: curl/7.29.0
>> > Host: az1lb01.mydomain.novalocal:8443
>> > Accept: */*
>> >
>> < HTTP/1.1 403 Forbidden
>> < Cache-Control: no-store
>> < Content-Type: application/json
>> < Date: Thu, 25 Feb 2016 14:42:41 GMT
>> < Content-Length: 255
>> <
>> {
>>   "kind": "Status",
>>   "apiVersion": "v1",
>>   "metadata": {},
>>   "status": "Failure",
>>   "message": "User \"system:anonymous\" cannot list all namespaces in the
>> cluster",
>>   "reason": "Forbidden",
>>   "details": {
>>     "kind": "namespaces"
>>   },
>>   "code": 403
>> }
>> * Connection #0 to host az1lb01.mydomain.novalocal left intact
>>
>> Attached is also the master configuration file for one master.
>>
>>
>> My questions:
>>
>> - I had many issues in getting the installation working, mostly due to
>> the Ansible installer reading the OpenStack instance metadata, and
>> inconsistencies btw. that and the "hostname".
>>
>>   Is there any particular repo / branch of the installer that is known to
>> work in this particular setup ? Any particular settings I should use in the
>> Ansible hosts file ?
>>
>>   I suspect the certificate issues I'm encountering is because of that
>> (in combination with the proxy) but I'm not sure.
>>
>> - Operating behind an HTTP / HTTPS proxy: Even before starting the
>> Ansible installer, Docker was (properly) configured to the HTTP / HTTPS
>> proxy settings, and working correctly. However, for the installer itself I
>> found no way to express the "HTTP_PROXY" "HTTPS_PROXY" and, particularly,
>> the "NO_PROXY" settings.  For that I'm relying on exported environment
>> variables in the shell. Is there a "proper" way to do this via the
>> installer itself.
>>
>
> There is an openshift-ansible PR to expose this directly (
> https://github.com/openshift/openshift-ansible/pull/1385)
>
>
>>
>>   Post installer I have manually added those settings into
>> "/etc/sysconfig/atomic-openshift-master",
>> "/etc/sysconfig/atomic-openshift-master-controllers",
>> "/etc/sysconfig/atomic-openshift-master-api" and, respectively for the
>> nodes, "/etc/sysconfig/atomic-openshift-node", but don't know how to do
>> this via the installer itself.
>>
>>
>> - Is there an issue with the masters doubling as "etcd" nodes ?
>>
>
> No, there should not be any issues with co-locating the etcd service
> alongside the masters.
>
>
>>
>>
>> The most frustrating part  is that I have this very setup working
>> perfectly fine in a public cloud environment (namely on GCE) , but with the
>> (three) "etcd" hosts distinct from the masters (i.e. total of 9 hosts
>> instead of 6), and with unproxied Internet access.... However, that
>> installation is from a different repo branch (namely from "
>> https://github.com/detiber/openshift-ansible"; from the "gceFixes" branch
>> )
>>
>
> I *believe* all of the fixes from gceFixes have been merged into master at
> this point.
>
>
>>
>>
>> Thanks a lot for the help,
>>
>> Florian
>>
>> P.S. The weirdest case wrt certificates is when trying to check the
>> "etcd" cluster:
>>
>>
>> [root@az1master01 ~]# etcdctl --debug  -C
>> https://az1master01.mydomain.novalocal:2379,
>> https://az3master02.mydomain.novalocal:2379,
>> https://az3master03.mydomain.novalocal:2379 --ca-file
>> /etc/origin/master/ca.crt  --cert-file
>> /etc/origin/master/master.etcd-client.crt     --key-file
>> /etc/origin/master/master.etcd-client.key cluster-health
>> Cluster-Endpoints: https://az3master02.mydomain.novalocal:2379,
>> https://az1master01.mydomain.novalocal:2379,
>> https://az3master03.mydomain.novalocal:2379
>> cURL Command: curl -X GET
>> https://az3master02.mydomain.novalocal:2379/v2/members
>> cURL Command: curl -X GET
>> https://az1master01.mydomain.novalocal:2379/v2/members
>> cURL Command: curl -X GET
>> https://az3master03.mydomain.novalocal:2379/v2/members
>> cluster may be unhealthy: failed to list members
>> Error:  client: etcd cluster is unavailable or misconfigured
>> error #0: x509: certificate signed by unknown authority
>> error #1: x509: certificate signed by unknown authority
>> error #2: x509: certificate signed by unknown authority
>>
>
> You need to use the etcd ca cert here: etcdctl --debug  -C
> https://az1master01.mydomain.novalocal:2379,
> https://az3master02.mydomain.novalocal:2379,
> https://az3master03.mydomain.novalocal:2379 --ca-file
> /etc/origin/master/master.etcd-ca.crt  --cert-file
> /etc/origin/master/master.etcd-client.crt     --key-file
> /etc/origin/master/master.etcd-client.key cluster-health
>
>
>>
>>
>>
>> Attempting doing a direct curl to the "etcd"
>>
>> [root@az1master01 ~]# curl -v   --cacert /etc/origin/master/ca.crt
>> --cert /etc/origin/master/master.etcd-client.crt     --key
>> /etc/origin/master/master.etcd-client.key
>> https://az1master01.mydomain.novalocal:2379/v2/members
>> * About to connect() to az1master01.mydomain.novalocal port 2379 (#0)
>> *   Trying 10.0.0.22...
>> * Connected to az1master01.mydomain.novalocal (10.0.0.22) port 2379 (#0)
>> * Initializing NSS with certpath: sql:/etc/pki/nssdb
>> *   CAfile: /etc/origin/master/ca.crt
>>   CApath: none
>> * Server certificate:
>> * subject: CN=az1master01.mydomain.novalocal
>> * start date: Feb 24 19:38:07 2016 GMT
>> * expire date: Feb 23 19:38:07 2017 GMT
>> * common name: az1master01.mydomain.novalocal
>> * issuer: CN=etcd-signer@1456342665
>> * NSS error -8179 (SEC_ERROR_UNKNOWN_ISSUER)
>> * Peer's Certificate issuer is not recognized.
>> * Closing connection 0
>> curl: (60) Peer's Certificate issuer is not recognized.
>> More details here: http://curl.haxx.se/docs/sslcerts.html
>>
>> curl performs SSL certificate verification by default, using a "bundle"
>>  of Certificate Authority (CA) public keys (CA certs). If the default
>>  bundle file isn't adequate, you can specify an alternate file
>>  using the --cacert option.
>> If this HTTPS server uses a certificate signed by a CA represented in
>>  the bundle, the certificate verification probably failed due to a
>>  problem with the certificate (it might be expired, or the name might
>>  not match the domain name in the URL).
>> If you'd like to turn off curl's verification of the certificate, use
>>  the -k (or --insecure) option.
>> [root@az1master01 ~]#
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>
>>
>
>
> --
> Jason DeTiberus
>

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: Help debug "oc login" returning "401" / certificate issues

Reply via email to