Re: origin crashed

Candide Kemmler Wed, 07 Sep 2016 10:09:01 -0700

> On 06 Sep 2016, at 18:00, Clayton Coleman <[email protected]> wrote:
> 
> What auth mechanism backs your "admin" user?
>


.htpasswd

Thanks for the follow up

Candide

> On Sep 6, 2016, at 10:19 AM, Candide Kemmler <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> Yes that seems to be OK..., although I'm not sure I know exactly what the 
>> "root cluster cert is", so I checked all the following:
>> 
>> [root@paas master]# openssl x509 -enddate -noout -in cloudapps.router.pem
>> notAfter=Apr 21 16:38:31 2018 GMT
>> [root@paas master]# openssl x509 -enddate -noout -in ca.crt
>> notAfter=Apr 20 16:31:56 2021 GMT
>> [root@paas master]# openssl x509 -enddate -noout -in master.server.crt
>> notAfter=Apr 21 16:32:00 2018 GMT
>> [root@paas master]# openssl x509 -enddate -noout -in etcd.server.crt
>> notAfter=Apr 21 16:32:01 2018 GMT
>> [root@paas master]# openssl x509 -enddate -noout -in admin.crt
>> notAfter=Apr 21 16:31:58 2018 GMT
>> [root@paas master]# openssl x509 -enddate -noout -in ca-bundle.crt
>> notAfter=Apr 20 16:31:56 2021 GMT
>> [root@paas master]# openssl x509 -enddate -noout -in openshift-master.crt
>> notAfter=Apr 21 16:31:57 2018 GMT
>> [root@paas master]# openssl x509 -enddate -noout -in openshift-registry.crt
>> notAfter=Apr 21 16:32:00 2018 GMT
>> 
>>> On 06 Sep 2016, at 15:04, Clayton Coleman <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Were you able to check the expiration date on your admin root cluster cert 
>>> and verify it has not expired?
>>> 
>>> On Sep 6, 2016, at 5:19 AM, Candide Kemmler <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>>> Hi Clayton,
>>>> 
>>>> Thanks! Here's the result of running `sudo oadm diagnostics`. I'm 
>>>> particularly bothered by the "the server has asked for the client to 
>>>> provide credentials" message as I'm seeing this one when I try to execute 
>>>> the ansible scripts as well. Do you know how to solve it?
>>>> 
>>>> Any other ideas on things I should focus on?
>>>> 
>>>> Regards,
>>>> 
>>>> Candide
>>>> 
>>>> 
>>>> [Note] Determining if client configuration exists for client/cluster 
>>>> diagnostics
>>>> Info:  Successfully read a client config file at '/root/.kube/config'
>>>> [Note] Could not configure a client, so client diagnostics are limited to 
>>>> testing configuration and connection
>>>> Info:  Using context for cluster-admin access: 
>>>> 'default/paas-intrinsic-world:8443/system:admin'
>>>> [Note] Performing systemd discovery
>>>> 
>>>> [Note] Running diagnostic: 
>>>> ConfigContexts[logging/paas-intrinsic-world:8443/admin]
>>>>        Description: Validate client config context is complete and has 
>>>> connectivity
>>>> 
>>>> ERROR: [DCli0014 from diagnostic 
>>>> ConfigContexts@openshift/origin/pkg/diagnostics/client/config_contexts.go:285]
>>>>        For client config context 'logging/paas-intrinsic-world:8443/admin':
>>>>        The server URL is 'https://paas.intrinsic.world:8443' 
>>>> <https://paas.intrinsic.world:8443'>
>>>>        The user authentication is 'admin/paas-intrinsic-world:8443'
>>>>        The current project is 'logging'
>>>>        (*errors.StatusError) the server has asked for the client to 
>>>> provide credentials
>>>> 
>>>>        This means that when we tried to make a request to the master API
>>>>        server, the request required credentials that were not presented. 
>>>> This
>>>>        can happen with an expired or invalid authentication token. Try 
>>>> logging
>>>>        in with this user again.
>>>> 
>>>> [Note] Running diagnostic: 
>>>> ConfigContexts[logging/paas-intrinsic-world:8443/system:admin]
>>>>        Description: Validate client config context is complete and has 
>>>> connectivity
>>>> 
>>>> Info:  For client config context 
>>>> 'logging/paas-intrinsic-world:8443/system:admin':
>>>>        The server URL is 'https://paas.intrinsic.world:8443' 
>>>> <https://paas.intrinsic.world:8443'>
>>>>        The user authentication is 'system:admin/paas-intrinsic-world:8443'
>>>>        The current project is 'logging'
>>>>        Successfully requested project list; has access to project(s):
>>>>          [openshift-infra dev ieml-demo logging management-infra misc 
>>>> openshift p2p default ieml-dev ...]
>>>> 
>>>> [Note] Running diagnostic: ClusterRegistry
>>>>        Description: Check that there is a working Docker registry
>>>> 
>>>> WARN:  [DClu1009 from diagnostic 
>>>> ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:217]
>>>>        The "docker-registry-1-8w93s" pod for the "docker-registry" service 
>>>> is not running.
>>>>        This may be transient, a scheduling error, or something else.
>>>> 
>>>> ERROR: [DClu1001 from diagnostic 
>>>> ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:173]
>>>>        The "docker-registry" service exists but no pods currently running, 
>>>> so it
>>>>        is not available. Builds and deployments that use the registry will 
>>>> fail.
>>>> 
>>>> [Note] Running diagnostic: ClusterRoleBindings
>>>>        Description: Check that the default ClusterRoleBindings are present 
>>>> and contain the expected subjects
>>>> 
>>>> Info:  clusterrolebinding/cluster-admins has more subjects than expected.
>>>> 
>>>>        Use the `oadm policy reconcile-cluster-role-bindings` command to 
>>>> update the role binding to remove extra subjects.
>>>> 
>>>> Info:  clusterrolebinding/cluster-admins has extra subject {User  admin    
>>>> }.
>>>> 
>>>> Info:  clusterrolebinding/cluster-readers has more subjects than expected.
>>>> 
>>>>        Use the `oadm policy reconcile-cluster-role-bindings` command to 
>>>> update the role binding to remove extra subjects.
>>>> 
>>>> Info:  clusterrolebinding/cluster-readers has extra subject 
>>>> {ServiceAccount management-infra management-admin    }.
>>>> Info:  clusterrolebinding/cluster-readers has extra subject 
>>>> {ServiceAccount logging aggregated-logging-fluentd    }.
>>>> 
>>>> [Note] Running diagnostic: ClusterRoles
>>>>        Description: Check that the default ClusterRoles are present and 
>>>> contain the expected permissions
>>>> 
>>>> [Note] Running diagnostic: ClusterRouterName
>>>>        Description: Check there is a working router
>>>> 
>>>> ERROR: [DClu2007 from diagnostic 
>>>> ClusterRouter@openshift/origin/pkg/diagnostics/cluster/router.go:156]
>>>>        The "router" DeploymentConfig exists but has no running pods, so it
>>>>        is not available. Apps will not be externally accessible via the 
>>>> router.
>>>> 
>>>> [Note] Running diagnostic: MasterNode
>>>>        Description: Check if master is also running node (for Open vSwitch)
>>>> 
>>>> Info:  Found a node with same IP as master: paas.intrinsic.world
>>>> 
>>>> [Note] Running diagnostic: NodeDefinitions
>>>>        Description: Check node records on master
>>>> 
>>>> WARN:  [DClu0003 from diagnostic 
>>>> NodeDefinition@openshift/origin/pkg/diagnostics/cluster/node_definitions.go:112]
>>>>        Node paas.intrinsic.world is ready but is marked Unschedulable.
>>>>        This is usually set manually for administrative reasons.
>>>>        An administrator can mark the node schedulable with:
>>>>            oadm manage-node paas.intrinsic.world --schedulable=true
>>>> 
>>>>        While in this state, pods should not be scheduled to deploy on the 
>>>> node.
>>>>        Existing pods will continue to run until completed or evacuated (see
>>>>        other options for 'oadm manage-node').
>>>> 
>>>> [Note] Running diagnostic: AnalyzeLogs
>>>>        Description: Check for recent problems in systemd service logs
>>>> 
>>>> Info:  Checking journalctl logs for 'origin-master' service
>>>> Info:  Checking journalctl logs for 'origin-node' service
>>>> Info:  Checking journalctl logs for 'docker' service
>>>> 
>>>> [Note] Running diagnostic: MasterConfigCheck
>>>>        Description: Check the master config file
>>>> 
>>>> Info:  Found a master config file: /etc/origin/master/master-config.yaml
>>>> 
>>>> WARN:  [DH0005 from diagnostic 
>>>> MasterConfigCheck@openshift/origin/pkg/diagnostics/host/check_master_config.go:58]
>>>>        Validation of master config file 
>>>> '/etc/origin/master/master-config.yaml' warned:
>>>>        assetConfig.loggingPublicURL: Invalid value: "": required to view 
>>>> aggregated container logs in the console
>>>>        assetConfig.metricsPublicURL: Invalid value: "": required to view 
>>>> cluster metrics in the console
>>>> 
>>>> [Note] Running diagnostic: NodeConfigCheck
>>>>        Description: Check the node config file
>>>> 
>>>> Info:  Found a node config file: /etc/origin/node/node-config.yaml
>>>> 
>>>> [Note] Running diagnostic: UnitStatus
>>>>        Description: Check status for related systemd units
>>>> 
>>>> [Note] Summary of diagnostics execution (version v1.1.6):
>>>> [Note] Warnings seen: 3
>>>> [Note] Errors seen: 4
>>>> 
>>>> 
>>>> 
>>>>> On 05 Sep 2016, at 18:46, Clayton Coleman <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Did you change the IP of your master, or otherwise delete / alter the
>>>>> openshift-infra namespace?  Or have your client certificates expired
>>>>> (is this cluster 1 year old(?
>>>>> 
>>>>> Before deleting, try two things:
>>>>> 
>>>>>    oadm diagnostics
>>>>> 
>>>>> From the master (to see if it identifies anything).
>>>>> 
>>>>> Also check your certificate expiration a.
>>>>> 
>>>>>> On Sep 5, 2016, at 5:00 AM, Candide Kemmler <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I have a development server setup made up of two nodes (1 master - 1 
>>>>>> slave) running a bunch of different projects and environments which just 
>>>>>> crashed badly on me.
>>>>>> 
>>>>>> Symptoms are: all containers in all projects are in pending state 
>>>>>> (orange circle) - when I try to `delete all`, things get removed but 
>>>>>> pods hang in a 'terminating' state. oc describe gives me uninteresting 
>>>>>> information that I already know (basically that pods are Pending) and oc 
>>>>>> logs tells me that it (could not find the requested resource).
>>>>>> 
>>>>>> I tried to `sudo systemctl restart origin-master` as it seems to have 
>>>>>> produced good results in the past, but that didn't help this time. I 
>>>>>> also tried that in combination with a full system reboot.
>>>>>> 
>>>>>> Finally I tried running the ansible scripts in hopes of updating origin 
>>>>>> to the latest version (it's still running 1.1.6) but I got the following 
>>>>>> error log:
>>>>>> 
>>>>>> failed: [paas.intrinsic.world] => {"changed": false, "cmd": ["oc", 
>>>>>> "create", "-n", "openshift", "-f", 
>>>>>> "/usr/share/openshift/examples/image-streams/image-streams-centos7.json"],
>>>>>>  "delta": "0:00:00.180874", "end": "2016-09-05 07:20:12.050123", 
>>>>>> "failed": true, "failed_when_result": true, "rc": 1, "start": 
>>>>>> "2016-09-05 07:20:11.869249", "stdout_lines": [], "warnings": []}
>>>>>> stderr: unable to connect to a server to handle "imagestreamlists": the 
>>>>>> server has asked for the client to provide credentials
>>>>>> 
>>>>>> FATAL: all hosts have already failed -- aborting
>>>>>> 
>>>>>> PLAY RECAP 
>>>>>> ********************************************************************
>>>>>>          to retry, use: --limit @/Users/candide/config.retry
>>>>>> 
>>>>>> apps.intrinsic.world       : ok=48   changed=0    unreachable=0    
>>>>>> failed=0
>>>>>> localhost                  : ok=15   changed=0    unreachable=0    
>>>>>> failed=0
>>>>>> paas.intrinsic.world       : ok=207  changed=0    unreachable=0    
>>>>>> failed=1
>>>>>> 
>>>>>> My last option is to reinstall everything from scratch but before I do 
>>>>>> this I wanted to know if you guys had other ideas on how to get on top 
>>>>>> of things again.
>>>>>> 
>>>>>> Candide
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> [email protected] 
>>>>>> <mailto:[email protected]>
>>>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users 
>>>>>> <http://lists.openshift.redhat.com/openshiftmm/listinfo/users>
>>>> 
>>

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: origin crashed

Reply via email to