Re: origin crashed

Candide Kemmler Tue, 06 Sep 2016 02:23:10 -0700

Hi Clayton,

Thanks! Here's the result of running `sudo oadm diagnostics`. I'm particularly 
bothered by the "the server has asked for the client to provide credentials" 
message as I'm seeing this one when I try to execute the ansible scripts as 
well. Do you know how to solve it?


Any other ideas on things I should focus on?

Regards,

Candide


[Note] Determining if client configuration exists for client/cluster diagnostics
Info:  Successfully read a client config file at '/root/.kube/config'
[Note] Could not configure a client, so client diagnostics are limited to 
testing configuration and connection
Info:  Using context for cluster-admin access: 
'default/paas-intrinsic-world:8443/system:admin'
[Note] Performing systemd discovery

[Note] Running diagnostic: 
ConfigContexts[logging/paas-intrinsic-world:8443/admin]
       Description: Validate client config context is complete and has 
connectivity

ERROR: [DCli0014 from diagnostic 
ConfigContexts@openshift/origin/pkg/diagnostics/client/config_contexts.go:285]
       For client config context 'logging/paas-intrinsic-world:8443/admin':
       The server URL is 'https://paas.intrinsic.world:8443'
       The user authentication is 'admin/paas-intrinsic-world:8443'
       The current project is 'logging'
       (*errors.StatusError) the server has asked for the client to provide 
credentials

       This means that when we tried to make a request to the master API
       server, the request required credentials that were not presented. This
       can happen with an expired or invalid authentication token. Try logging
       in with this user again.

[Note] Running diagnostic: 
ConfigContexts[logging/paas-intrinsic-world:8443/system:admin]
       Description: Validate client config context is complete and has 
connectivity

Info:  For client config context 
'logging/paas-intrinsic-world:8443/system:admin':
       The server URL is 'https://paas.intrinsic.world:8443'
       The user authentication is 'system:admin/paas-intrinsic-world:8443'
       The current project is 'logging'
       Successfully requested project list; has access to project(s):
         [openshift-infra dev ieml-demo logging management-infra misc openshift 
p2p default ieml-dev ...]

[Note] Running diagnostic: ClusterRegistry
       Description: Check that there is a working Docker registry

WARN:  [DClu1009 from diagnostic 
ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:217]
       The "docker-registry-1-8w93s" pod for the "docker-registry" service is 
not running.
       This may be transient, a scheduling error, or something else.

ERROR: [DClu1001 from diagnostic 
ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:173]
       The "docker-registry" service exists but no pods currently running, so it
       is not available. Builds and deployments that use the registry will fail.

[Note] Running diagnostic: ClusterRoleBindings
       Description: Check that the default ClusterRoleBindings are present and 
contain the expected subjects

Info:  clusterrolebinding/cluster-admins has more subjects than expected.

       Use the `oadm policy reconcile-cluster-role-bindings` command to update 
the role binding to remove extra subjects.

Info:  clusterrolebinding/cluster-admins has extra subject {User  admin    }.

Info:  clusterrolebinding/cluster-readers has more subjects than expected.

       Use the `oadm policy reconcile-cluster-role-bindings` command to update 
the role binding to remove extra subjects.

Info:  clusterrolebinding/cluster-readers has extra subject {ServiceAccount 
management-infra management-admin    }.
Info:  clusterrolebinding/cluster-readers has extra subject {ServiceAccount 
logging aggregated-logging-fluentd    }.

[Note] Running diagnostic: ClusterRoles
       Description: Check that the default ClusterRoles are present and contain 
the expected permissions

[Note] Running diagnostic: ClusterRouterName
       Description: Check there is a working router

ERROR: [DClu2007 from diagnostic 
ClusterRouter@openshift/origin/pkg/diagnostics/cluster/router.go:156]
       The "router" DeploymentConfig exists but has no running pods, so it
       is not available. Apps will not be externally accessible via the router.

[Note] Running diagnostic: MasterNode
       Description: Check if master is also running node (for Open vSwitch)

Info:  Found a node with same IP as master: paas.intrinsic.world

[Note] Running diagnostic: NodeDefinitions
       Description: Check node records on master

WARN:  [DClu0003 from diagnostic 
NodeDefinition@openshift/origin/pkg/diagnostics/cluster/node_definitions.go:112]
       Node paas.intrinsic.world is ready but is marked Unschedulable.
       This is usually set manually for administrative reasons.
       An administrator can mark the node schedulable with:
           oadm manage-node paas.intrinsic.world --schedulable=true

       While in this state, pods should not be scheduled to deploy on the node.
       Existing pods will continue to run until completed or evacuated (see
       other options for 'oadm manage-node').

[Note] Running diagnostic: AnalyzeLogs
       Description: Check for recent problems in systemd service logs

Info:  Checking journalctl logs for 'origin-master' service
Info:  Checking journalctl logs for 'origin-node' service
Info:  Checking journalctl logs for 'docker' service

[Note] Running diagnostic: MasterConfigCheck
       Description: Check the master config file

Info:  Found a master config file: /etc/origin/master/master-config.yaml

WARN:  [DH0005 from diagnostic 
MasterConfigCheck@openshift/origin/pkg/diagnostics/host/check_master_config.go:58]
       Validation of master config file '/etc/origin/master/master-config.yaml' 
warned:
       assetConfig.loggingPublicURL: Invalid value: "": required to view 
aggregated container logs in the console
       assetConfig.metricsPublicURL: Invalid value: "": required to view 
cluster metrics in the console

[Note] Running diagnostic: NodeConfigCheck
       Description: Check the node config file

Info:  Found a node config file: /etc/origin/node/node-config.yaml

[Note] Running diagnostic: UnitStatus
       Description: Check status for related systemd units

[Note] Summary of diagnostics execution (version v1.1.6):
[Note] Warnings seen: 3
[Note] Errors seen: 4



> On 05 Sep 2016, at 18:46, Clayton Coleman <[email protected]> wrote:
> 
> Did you change the IP of your master, or otherwise delete / alter the
> openshift-infra namespace?  Or have your client certificates expired
> (is this cluster 1 year old(?
> 
> Before deleting, try two things:
> 
>    oadm diagnostics
> 
> From the master (to see if it identifies anything).
> 
> Also check your certificate expiration a.
> 
>> On Sep 5, 2016, at 5:00 AM, Candide Kemmler <[email protected]> wrote:
>> 
>> Hi,
>> 
>> I have a development server setup made up of two nodes (1 master - 1 slave) 
>> running a bunch of different projects and environments which just crashed 
>> badly on me.
>> 
>> Symptoms are: all containers in all projects are in pending state (orange 
>> circle) - when I try to `delete all`, things get removed but pods hang in a 
>> 'terminating' state. oc describe gives me uninteresting information that I 
>> already know (basically that pods are Pending) and oc logs tells me that it 
>> (could not find the requested resource).
>> 
>> I tried to `sudo systemctl restart origin-master` as it seems to have 
>> produced good results in the past, but that didn't help this time. I also 
>> tried that in combination with a full system reboot.
>> 
>> Finally I tried running the ansible scripts in hopes of updating origin to 
>> the latest version (it's still running 1.1.6) but I got the following error 
>> log:
>> 
>> failed: [paas.intrinsic.world] => {"changed": false, "cmd": ["oc", "create", 
>> "-n", "openshift", "-f", 
>> "/usr/share/openshift/examples/image-streams/image-streams-centos7.json"], 
>> "delta": "0:00:00.180874", "end": "2016-09-05 07:20:12.050123", "failed": 
>> true, "failed_when_result": true, "rc": 1, "start": "2016-09-05 
>> 07:20:11.869249", "stdout_lines": [], "warnings": []}
>> stderr: unable to connect to a server to handle "imagestreamlists": the 
>> server has asked for the client to provide credentials
>> 
>> FATAL: all hosts have already failed -- aborting
>> 
>> PLAY RECAP 
>> ********************************************************************
>>          to retry, use: --limit @/Users/candide/config.retry
>> 
>> apps.intrinsic.world       : ok=48   changed=0    unreachable=0    failed=0
>> localhost                  : ok=15   changed=0    unreachable=0    failed=0
>> paas.intrinsic.world       : ok=207  changed=0    unreachable=0    failed=1
>> 
>> My last option is to reinstall everything from scratch but before I do this 
>> I wanted to know if you guys had other ideas on how to get on top of things 
>> again.
>> 
>> Candide
>> 
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: origin crashed

Reply via email to