Re: Help debug "oc login" returning "401" / certificate issues

Jason DeTiberus Thu, 25 Feb 2016 18:25:04 -0800

On Thu, Feb 25, 2016 at 5:03 PM, Florian Daniel Otel <[email protected]
> wrote:


> Hi Jason,
>
> Kindest thanks for trying to help.
>
> In order
>
> 1) Indeed, the "lb" host is configured (via dnsmasq) as a DNS forwarder,
> has the correct "/etc/hosts" (which is propagated to all the other hosts in
> the cluster), and all hosts have an entry pointing to it in the
> "/etc/resolv.conf"
>
> 2) A bit puzzled wrt "system:node" vs "system:anonymous"....
>
> I've just test the corresponding curl call on another system where
> everything work as expected (at least so far...)  and the response I get
> back from a GET to " /api/v1/namespaces" still refers to "system:anonymous"
> , and not "system:node"
>
> Also, to make things even more weird, if I copy the node "kubeconfig" in
> the ".kube/config" I am identified accordingly (i.e. as "system:node") when
> doing an "oc whoami"
>

I'm probably missing something with the way that the node identifies itself
when using client certificate authentication, I'm seeing the same behavior
on a system I have that is functioning as expected.


>
>
> 3) Thanks for pointing out that specifying "HTTP_PROXY" / "HTTPS_PROXY"
> and resp "NO_PROXY" is not yet possible via the Ansible installer.
>
> My  remaining question is: Is there any way to debug the authentication
> process / why the "oc login" with "httpasswd" back end doesn't work ?
>

You will most likely need to increase the logging level to see
authentication logs for the api service. In
/etc/sysconfig/atomic-openshift-master-api. increasing the loglevel to 4
should provide output around the authentication failure.



>
>
> Thanks again,
>
> /Florian
>
>
>
> On Thu, Feb 25, 2016 at 10:30 PM, Jason DeTiberus <[email protected]>
> wrote:
>
>>
>>
>> On Thu, Feb 25, 2016 at 10:54 AM, Florian Daniel Otel <
>> [email protected]> wrote:
>>
>>>
>>> Hello all,
>>>
>>> I have the following problems:
>>>
>>> I have a multimaster OSE setup consisting of the following:
>>> - A LB with "native" HA
>>> - Three masters (doubling as "etcd" nodes)
>>> - Two nodes
>>>
>>>
>>> All the hosts are themselves OpenStack instances (hence the ".novalocal"
>>> suffix). DNS is via an "/etc/hosts" propagated across, with the "lb" host
>>> doubling as DNS forwarder (via dnsmasq). All Internet access is via an http
>>> / https proxy.
>>>
>>
>> So, if I'm understanding this correctly, then the lb host is correctly
>> resolving the dns for all of the *.novalocal addresses that are in use by
>> the cluster and all of the hosts are pre-configured to use the lb host as
>> the dns resolver prior to running the installation? If not, then there will
>> definitely be issues, since /etc/hosts is not used by deployed containers.
>>
>>
>>>
>>> After many attempts we finally get a setup that is somewhat working (see
>>> P.S. for why "somehow"). Attached is the "/etc/ansible/hosts" file.
>>> Installation is from the main "openshift-ansible" repo (
>>> https://github.com/openshift/openshift-ansible)
>>>
>>> My problem:
>>>
>>> After installation, on one master I created two users in
>>> "/etc/origin/htpasswd". After creation I have propagated the file to all
>>> the other masters. UNIX permissions to the file on all masters are "0600"
>>>
>>> However, doing an "oc login" returns a "401 Unauthorized", and I cannot
>>> find what the issue is, or how to debug it (no trace for it in the
>>> "atomic-openshift-master-api" or "atomic-openshift-master-controllers"
>>> logs).
>>>
>>
>>>
>>> [root@az1node01 ~]# oc login
>>> Authentication required for https://az1lb01.mydomain.novalocal:8443
>>> (openshift)
>>> Username: reguser
>>> Password:
>>> Login failed (401 Unauthorized)
>>> Unauthorized
>>>
>>>
>>> The puzzling thing is that using the "system:node" certificates and keys
>>> work (in the sense I am identified as "system:anonymous"):
>>>
>>
>> Something is definitely not right here, the user for the system:node
>> certs should be identified as the system:node user and not anonymous. I
>> suspect that there is a larger issue at play here.
>>
>> It looks like the initial cluster creation may have had issues...  The
>> atomic-openshift-master-api logs should provide more insight into what may
>> have gone wrong.
>>
>>
>>>
>>>
>>> curl -v --cacert  /etc/origin/node/ca.crt --cert
>>> "/etc/origin/node/system:node:az1node01.mydomain.novalocal.crt" --key
>>> "/etc/origin/node/system:node:az1node01.mydomain.novalocal.key"
>>> https://az1lb01.mydomain.novalocal:8443/api/v1/namespaces
>>> * About to connect() to az1lb01.mydomain.novalocal port 8443 (#0)
>>> *   Trying 10.0.0.31...
>>> * Connected to az1lb01.mydomain.novalocal (10.0.0.31) port 8443 (#0)
>>> * Initializing NSS with certpath: sql:/etc/pki/nssdb
>>> *   CAfile: /etc/origin/node/ca.crt
>>>   CApath: none
>>> * NSS: client certificate not found: /etc/origin/node/system
>>> * SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
>>> * Server certificate:
>>> *       subject: CN=10.0.0.24
>>> *       start date: Feb 24 19:40:56 2016 GMT
>>> *       expire date: Feb 23 19:40:57 2018 GMT
>>> *       common name: 10.0.0.24
>>> *       issuer: CN=openshift-signer@1456342841
>>> > GET /api/v1/namespaces HTTP/1.1
>>> > User-Agent: curl/7.29.0
>>> > Host: az1lb01.mydomain.novalocal:8443
>>> > Accept: */*
>>> >
>>> < HTTP/1.1 403 Forbidden
>>> < Cache-Control: no-store
>>> < Content-Type: application/json
>>> < Date: Thu, 25 Feb 2016 14:42:41 GMT
>>> < Content-Length: 255
>>> <
>>> {
>>>   "kind": "Status",
>>>   "apiVersion": "v1",
>>>   "metadata": {},
>>>   "status": "Failure",
>>>   "message": "User \"system:anonymous\" cannot list all namespaces in
>>> the cluster",
>>>   "reason": "Forbidden",
>>>   "details": {
>>>     "kind": "namespaces"
>>>   },
>>>   "code": 403
>>> }
>>> * Connection #0 to host az1lb01.mydomain.novalocal left intact
>>>
>>> Attached is also the master configuration file for one master.
>>>
>>>
>>> My questions:
>>>
>>> - I had many issues in getting the installation working, mostly due to
>>> the Ansible installer reading the OpenStack instance metadata, and
>>> inconsistencies btw. that and the "hostname".
>>>
>>>   Is there any particular repo / branch of the installer that is known
>>> to work in this particular setup ? Any particular settings I should use in
>>> the Ansible hosts file ?
>>>
>>>   I suspect the certificate issues I'm encountering is because of that
>>> (in combination with the proxy) but I'm not sure.
>>>
>>> - Operating behind an HTTP / HTTPS proxy: Even before starting the
>>> Ansible installer, Docker was (properly) configured to the HTTP / HTTPS
>>> proxy settings, and working correctly. However, for the installer itself I
>>> found no way to express the "HTTP_PROXY" "HTTPS_PROXY" and, particularly,
>>> the "NO_PROXY" settings.  For that I'm relying on exported environment
>>> variables in the shell. Is there a "proper" way to do this via the
>>> installer itself.
>>>
>>
>> There is an openshift-ansible PR to expose this directly (
>> https://github.com/openshift/openshift-ansible/pull/1385)
>>
>>
>>>
>>>   Post installer I have manually added those settings into
>>> "/etc/sysconfig/atomic-openshift-master",
>>> "/etc/sysconfig/atomic-openshift-master-controllers",
>>> "/etc/sysconfig/atomic-openshift-master-api" and, respectively for the
>>> nodes, "/etc/sysconfig/atomic-openshift-node", but don't know how to do
>>> this via the installer itself.
>>>
>>>
>>> - Is there an issue with the masters doubling as "etcd" nodes ?
>>>
>>
>> No, there should not be any issues with co-locating the etcd service
>> alongside the masters.
>>
>>
>>>
>>>
>>> The most frustrating part  is that I have this very setup working
>>> perfectly fine in a public cloud environment (namely on GCE) , but with the
>>> (three) "etcd" hosts distinct from the masters (i.e. total of 9 hosts
>>> instead of 6), and with unproxied Internet access.... However, that
>>> installation is from a different repo branch (namely from "
>>> https://github.com/detiber/openshift-ansible"; from the "gceFixes"
>>> branch )
>>>
>>
>> I *believe* all of the fixes from gceFixes have been merged into master
>> at this point.
>>
>>
>>>
>>>
>>> Thanks a lot for the help,
>>>
>>> Florian
>>>
>>> P.S. The weirdest case wrt certificates is when trying to check the
>>> "etcd" cluster:
>>>
>>>
>>> [root@az1master01 ~]# etcdctl --debug  -C
>>> https://az1master01.mydomain.novalocal:2379,
>>> https://az3master02.mydomain.novalocal:2379,
>>> https://az3master03.mydomain.novalocal:2379 --ca-file
>>> /etc/origin/master/ca.crt  --cert-file
>>> /etc/origin/master/master.etcd-client.crt     --key-file
>>> /etc/origin/master/master.etcd-client.key cluster-health
>>> Cluster-Endpoints: https://az3master02.mydomain.novalocal:2379,
>>> https://az1master01.mydomain.novalocal:2379,
>>> https://az3master03.mydomain.novalocal:2379
>>> cURL Command: curl -X GET
>>> https://az3master02.mydomain.novalocal:2379/v2/members
>>> cURL Command: curl -X GET
>>> https://az1master01.mydomain.novalocal:2379/v2/members
>>> cURL Command: curl -X GET
>>> https://az3master03.mydomain.novalocal:2379/v2/members
>>> cluster may be unhealthy: failed to list members
>>> Error:  client: etcd cluster is unavailable or misconfigured
>>> error #0: x509: certificate signed by unknown authority
>>> error #1: x509: certificate signed by unknown authority
>>> error #2: x509: certificate signed by unknown authority
>>>
>>
>> You need to use the etcd ca cert here: etcdctl --debug  -C
>> https://az1master01.mydomain.novalocal:2379,
>> https://az3master02.mydomain.novalocal:2379,
>> https://az3master03.mydomain.novalocal:2379 --ca-file
>> /etc/origin/master/master.etcd-ca.crt  --cert-file
>> /etc/origin/master/master.etcd-client.crt     --key-file
>> /etc/origin/master/master.etcd-client.key cluster-health
>>
>>
>>>
>>>
>>>
>>> Attempting doing a direct curl to the "etcd"
>>>
>>> [root@az1master01 ~]# curl -v   --cacert /etc/origin/master/ca.crt
>>> --cert /etc/origin/master/master.etcd-client.crt     --key
>>> /etc/origin/master/master.etcd-client.key
>>> https://az1master01.mydomain.novalocal:2379/v2/members
>>> * About to connect() to az1master01.mydomain.novalocal port 2379 (#0)
>>> *   Trying 10.0.0.22...
>>> * Connected to az1master01.mydomain.novalocal (10.0.0.22) port 2379 (#0)
>>> * Initializing NSS with certpath: sql:/etc/pki/nssdb
>>> *   CAfile: /etc/origin/master/ca.crt
>>>   CApath: none
>>> * Server certificate:
>>> * subject: CN=az1master01.mydomain.novalocal
>>> * start date: Feb 24 19:38:07 2016 GMT
>>> * expire date: Feb 23 19:38:07 2017 GMT
>>> * common name: az1master01.mydomain.novalocal
>>> * issuer: CN=etcd-signer@1456342665
>>> * NSS error -8179 (SEC_ERROR_UNKNOWN_ISSUER)
>>> * Peer's Certificate issuer is not recognized.
>>> * Closing connection 0
>>> curl: (60) Peer's Certificate issuer is not recognized.
>>> More details here: http://curl.haxx.se/docs/sslcerts.html
>>>
>>> curl performs SSL certificate verification by default, using a "bundle"
>>>  of Certificate Authority (CA) public keys (CA certs). If the default
>>>  bundle file isn't adequate, you can specify an alternate file
>>>  using the --cacert option.
>>> If this HTTPS server uses a certificate signed by a CA represented in
>>>  the bundle, the certificate verification probably failed due to a
>>>  problem with the certificate (it might be expired, or the name might
>>>  not match the domain name in the URL).
>>> If you'd like to turn off curl's verification of the certificate, use
>>>  the -k (or --insecure) option.
>>> [root@az1master01 ~]#
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>>
>>>
>>
>>
>> --
>> Jason DeTiberus
>>
>
>


-- 
Jason DeTiberus

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: Help debug "oc login" returning "401" / certificate issues

Reply via email to