Hey sorry to insist... anything I can do from here? > On 07 Sep 2016, at 19:05, Candide Kemmler <[email protected]> wrote: > > >> On 06 Sep 2016, at 18:00, Clayton Coleman <[email protected] >> <mailto:[email protected]>> wrote: >> >> What auth mechanism backs your "admin" user? >> > > .htpasswd > > Thanks for the follow up > > Candide > >> On Sep 6, 2016, at 10:19 AM, Candide Kemmler <[email protected] >> <mailto:[email protected]>> wrote: >> >>> Yes that seems to be OK..., although I'm not sure I know exactly what the >>> "root cluster cert is", so I checked all the following: >>> >>> [root@paas master]# openssl x509 -enddate -noout -in cloudapps.router.pem >>> notAfter=Apr 21 16:38:31 2018 GMT >>> [root@paas master]# openssl x509 -enddate -noout -in ca.crt >>> notAfter=Apr 20 16:31:56 2021 GMT >>> [root@paas master]# openssl x509 -enddate -noout -in master.server.crt >>> notAfter=Apr 21 16:32:00 2018 GMT >>> [root@paas master]# openssl x509 -enddate -noout -in etcd.server.crt >>> notAfter=Apr 21 16:32:01 2018 GMT >>> [root@paas master]# openssl x509 -enddate -noout -in admin.crt >>> notAfter=Apr 21 16:31:58 2018 GMT >>> [root@paas master]# openssl x509 -enddate -noout -in ca-bundle.crt >>> notAfter=Apr 20 16:31:56 2021 GMT >>> [root@paas master]# openssl x509 -enddate -noout -in openshift-master.crt >>> notAfter=Apr 21 16:31:57 2018 GMT >>> [root@paas master]# openssl x509 -enddate -noout -in openshift-registry.crt >>> notAfter=Apr 21 16:32:00 2018 GMT >>> >>>> On 06 Sep 2016, at 15:04, Clayton Coleman <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Were you able to check the expiration date on your admin root cluster cert >>>> and verify it has not expired? >>>> >>>> On Sep 6, 2016, at 5:19 AM, Candide Kemmler <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>>> Hi Clayton, >>>>> >>>>> Thanks! Here's the result of running `sudo oadm diagnostics`. I'm >>>>> particularly bothered by the "the server has asked for the client to >>>>> provide credentials" message as I'm seeing this one when I try to execute >>>>> the ansible scripts as well. Do you know how to solve it? >>>>> >>>>> Any other ideas on things I should focus on? >>>>> >>>>> Regards, >>>>> >>>>> Candide >>>>> >>>>> >>>>> [Note] Determining if client configuration exists for client/cluster >>>>> diagnostics >>>>> Info: Successfully read a client config file at '/root/.kube/config' >>>>> [Note] Could not configure a client, so client diagnostics are limited to >>>>> testing configuration and connection >>>>> Info: Using context for cluster-admin access: >>>>> 'default/paas-intrinsic-world:8443/system:admin' >>>>> [Note] Performing systemd discovery >>>>> >>>>> [Note] Running diagnostic: >>>>> ConfigContexts[logging/paas-intrinsic-world:8443/admin] >>>>> Description: Validate client config context is complete and has >>>>> connectivity >>>>> >>>>> ERROR: [DCli0014 from diagnostic >>>>> ConfigContexts@openshift/origin/pkg/diagnostics/client/config_contexts.go:285] >>>>> For client config context >>>>> 'logging/paas-intrinsic-world:8443/admin': >>>>> The server URL is 'https://paas.intrinsic.world:8443' >>>>> <https://paas.intrinsic.world:8443'> >>>>> The user authentication is 'admin/paas-intrinsic-world:8443' >>>>> The current project is 'logging' >>>>> (*errors.StatusError) the server has asked for the client to >>>>> provide credentials >>>>> >>>>> This means that when we tried to make a request to the master API >>>>> server, the request required credentials that were not presented. >>>>> This >>>>> can happen with an expired or invalid authentication token. Try >>>>> logging >>>>> in with this user again. >>>>> >>>>> [Note] Running diagnostic: >>>>> ConfigContexts[logging/paas-intrinsic-world:8443/system:admin] >>>>> Description: Validate client config context is complete and has >>>>> connectivity >>>>> >>>>> Info: For client config context >>>>> 'logging/paas-intrinsic-world:8443/system:admin': >>>>> The server URL is 'https://paas.intrinsic.world:8443' >>>>> <https://paas.intrinsic.world:8443'> >>>>> The user authentication is 'system:admin/paas-intrinsic-world:8443' >>>>> The current project is 'logging' >>>>> Successfully requested project list; has access to project(s): >>>>> [openshift-infra dev ieml-demo logging management-infra misc >>>>> openshift p2p default ieml-dev ...] >>>>> >>>>> [Note] Running diagnostic: ClusterRegistry >>>>> Description: Check that there is a working Docker registry >>>>> >>>>> WARN: [DClu1009 from diagnostic >>>>> ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:217] >>>>> The "docker-registry-1-8w93s" pod for the "docker-registry" >>>>> service is not running. >>>>> This may be transient, a scheduling error, or something else. >>>>> >>>>> ERROR: [DClu1001 from diagnostic >>>>> ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:173] >>>>> The "docker-registry" service exists but no pods currently >>>>> running, so it >>>>> is not available. Builds and deployments that use the registry >>>>> will fail. >>>>> >>>>> [Note] Running diagnostic: ClusterRoleBindings >>>>> Description: Check that the default ClusterRoleBindings are >>>>> present and contain the expected subjects >>>>> >>>>> Info: clusterrolebinding/cluster-admins has more subjects than expected. >>>>> >>>>> Use the `oadm policy reconcile-cluster-role-bindings` command to >>>>> update the role binding to remove extra subjects. >>>>> >>>>> Info: clusterrolebinding/cluster-admins has extra subject {User admin >>>>> }. >>>>> >>>>> Info: clusterrolebinding/cluster-readers has more subjects than expected. >>>>> >>>>> Use the `oadm policy reconcile-cluster-role-bindings` command to >>>>> update the role binding to remove extra subjects. >>>>> >>>>> Info: clusterrolebinding/cluster-readers has extra subject >>>>> {ServiceAccount management-infra management-admin }. >>>>> Info: clusterrolebinding/cluster-readers has extra subject >>>>> {ServiceAccount logging aggregated-logging-fluentd }. >>>>> >>>>> [Note] Running diagnostic: ClusterRoles >>>>> Description: Check that the default ClusterRoles are present and >>>>> contain the expected permissions >>>>> >>>>> [Note] Running diagnostic: ClusterRouterName >>>>> Description: Check there is a working router >>>>> >>>>> ERROR: [DClu2007 from diagnostic >>>>> ClusterRouter@openshift/origin/pkg/diagnostics/cluster/router.go:156] >>>>> The "router" DeploymentConfig exists but has no running pods, so it >>>>> is not available. Apps will not be externally accessible via the >>>>> router. >>>>> >>>>> [Note] Running diagnostic: MasterNode >>>>> Description: Check if master is also running node (for Open >>>>> vSwitch) >>>>> >>>>> Info: Found a node with same IP as master: paas.intrinsic.world >>>>> >>>>> [Note] Running diagnostic: NodeDefinitions >>>>> Description: Check node records on master >>>>> >>>>> WARN: [DClu0003 from diagnostic >>>>> NodeDefinition@openshift/origin/pkg/diagnostics/cluster/node_definitions.go:112] >>>>> Node paas.intrinsic.world is ready but is marked Unschedulable. >>>>> This is usually set manually for administrative reasons. >>>>> An administrator can mark the node schedulable with: >>>>> oadm manage-node paas.intrinsic.world --schedulable=true >>>>> >>>>> While in this state, pods should not be scheduled to deploy on the >>>>> node. >>>>> Existing pods will continue to run until completed or evacuated >>>>> (see >>>>> other options for 'oadm manage-node'). >>>>> >>>>> [Note] Running diagnostic: AnalyzeLogs >>>>> Description: Check for recent problems in systemd service logs >>>>> >>>>> Info: Checking journalctl logs for 'origin-master' service >>>>> Info: Checking journalctl logs for 'origin-node' service >>>>> Info: Checking journalctl logs for 'docker' service >>>>> >>>>> [Note] Running diagnostic: MasterConfigCheck >>>>> Description: Check the master config file >>>>> >>>>> Info: Found a master config file: /etc/origin/master/master-config.yaml >>>>> >>>>> WARN: [DH0005 from diagnostic >>>>> MasterConfigCheck@openshift/origin/pkg/diagnostics/host/check_master_config.go:58] >>>>> Validation of master config file >>>>> '/etc/origin/master/master-config.yaml' warned: >>>>> assetConfig.loggingPublicURL: Invalid value: "": required to view >>>>> aggregated container logs in the console >>>>> assetConfig.metricsPublicURL: Invalid value: "": required to view >>>>> cluster metrics in the console >>>>> >>>>> [Note] Running diagnostic: NodeConfigCheck >>>>> Description: Check the node config file >>>>> >>>>> Info: Found a node config file: /etc/origin/node/node-config.yaml >>>>> >>>>> [Note] Running diagnostic: UnitStatus >>>>> Description: Check status for related systemd units >>>>> >>>>> [Note] Summary of diagnostics execution (version v1.1.6): >>>>> [Note] Warnings seen: 3 >>>>> [Note] Errors seen: 4 >>>>> >>>>> >>>>> >>>>>> On 05 Sep 2016, at 18:46, Clayton Coleman <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> Did you change the IP of your master, or otherwise delete / alter the >>>>>> openshift-infra namespace? Or have your client certificates expired >>>>>> (is this cluster 1 year old(? >>>>>> >>>>>> Before deleting, try two things: >>>>>> >>>>>> oadm diagnostics >>>>>> >>>>>> From the master (to see if it identifies anything). >>>>>> >>>>>> Also check your certificate expiration a. >>>>>> >>>>>>> On Sep 5, 2016, at 5:00 AM, Candide Kemmler <[email protected] >>>>>>> <mailto:[email protected]>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have a development server setup made up of two nodes (1 master - 1 >>>>>>> slave) running a bunch of different projects and environments which >>>>>>> just crashed badly on me. >>>>>>> >>>>>>> Symptoms are: all containers in all projects are in pending state >>>>>>> (orange circle) - when I try to `delete all`, things get removed but >>>>>>> pods hang in a 'terminating' state. oc describe gives me uninteresting >>>>>>> information that I already know (basically that pods are Pending) and >>>>>>> oc logs tells me that it (could not find the requested resource). >>>>>>> >>>>>>> I tried to `sudo systemctl restart origin-master` as it seems to have >>>>>>> produced good results in the past, but that didn't help this time. I >>>>>>> also tried that in combination with a full system reboot. >>>>>>> >>>>>>> Finally I tried running the ansible scripts in hopes of updating origin >>>>>>> to the latest version (it's still running 1.1.6) but I got the >>>>>>> following error log: >>>>>>> >>>>>>> failed: [paas.intrinsic.world] => {"changed": false, "cmd": ["oc", >>>>>>> "create", "-n", "openshift", "-f", >>>>>>> "/usr/share/openshift/examples/image-streams/image-streams-centos7.json"], >>>>>>> "delta": "0:00:00.180874", "end": "2016-09-05 07:20:12.050123", >>>>>>> "failed": true, "failed_when_result": true, "rc": 1, "start": >>>>>>> "2016-09-05 07:20:11.869249", "stdout_lines": [], "warnings": []} >>>>>>> stderr: unable to connect to a server to handle "imagestreamlists": the >>>>>>> server has asked for the client to provide credentials >>>>>>> >>>>>>> FATAL: all hosts have already failed -- aborting >>>>>>> >>>>>>> PLAY RECAP >>>>>>> ******************************************************************** >>>>>>> to retry, use: --limit @/Users/candide/config.retry >>>>>>> >>>>>>> apps.intrinsic.world : ok=48 changed=0 unreachable=0 >>>>>>> failed=0 >>>>>>> localhost : ok=15 changed=0 unreachable=0 >>>>>>> failed=0 >>>>>>> paas.intrinsic.world : ok=207 changed=0 unreachable=0 >>>>>>> failed=1 >>>>>>> >>>>>>> My last option is to reinstall everything from scratch but before I do >>>>>>> this I wanted to know if you guys had other ideas on how to get on top >>>>>>> of things again. >>>>>>> >>>>>>> Candide >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> [email protected] >>>>>>> <mailto:[email protected]> >>>>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>>>>>> <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> >>>>> >>> >
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
