Hi Clayton,
Thanks! Here's the result of running `sudo oadm diagnostics`. I'm particularly
bothered by the "the server has asked for the client to provide credentials"
message as I'm seeing this one when I try to execute the ansible scripts as
well. Do you know how to solve it?
Any other ideas on things I should focus on?
Regards,
Candide
[Note] Determining if client configuration exists for client/cluster diagnostics
Info: Successfully read a client config file at '/root/.kube/config'
[Note] Could not configure a client, so client diagnostics are limited to
testing configuration and connection
Info: Using context for cluster-admin access:
'default/paas-intrinsic-world:8443/system:admin'
[Note] Performing systemd discovery
[Note] Running diagnostic:
ConfigContexts[logging/paas-intrinsic-world:8443/admin]
Description: Validate client config context is complete and has
connectivity
ERROR: [DCli0014 from diagnostic
ConfigContexts@openshift/origin/pkg/diagnostics/client/config_contexts.go:285]
For client config context 'logging/paas-intrinsic-world:8443/admin':
The server URL is 'https://paas.intrinsic.world:8443'
The user authentication is 'admin/paas-intrinsic-world:8443'
The current project is 'logging'
(*errors.StatusError) the server has asked for the client to provide
credentials
This means that when we tried to make a request to the master API
server, the request required credentials that were not presented. This
can happen with an expired or invalid authentication token. Try logging
in with this user again.
[Note] Running diagnostic:
ConfigContexts[logging/paas-intrinsic-world:8443/system:admin]
Description: Validate client config context is complete and has
connectivity
Info: For client config context
'logging/paas-intrinsic-world:8443/system:admin':
The server URL is 'https://paas.intrinsic.world:8443'
The user authentication is 'system:admin/paas-intrinsic-world:8443'
The current project is 'logging'
Successfully requested project list; has access to project(s):
[openshift-infra dev ieml-demo logging management-infra misc openshift
p2p default ieml-dev ...]
[Note] Running diagnostic: ClusterRegistry
Description: Check that there is a working Docker registry
WARN: [DClu1009 from diagnostic
ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:217]
The "docker-registry-1-8w93s" pod for the "docker-registry" service is
not running.
This may be transient, a scheduling error, or something else.
ERROR: [DClu1001 from diagnostic
ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:173]
The "docker-registry" service exists but no pods currently running, so it
is not available. Builds and deployments that use the registry will fail.
[Note] Running diagnostic: ClusterRoleBindings
Description: Check that the default ClusterRoleBindings are present and
contain the expected subjects
Info: clusterrolebinding/cluster-admins has more subjects than expected.
Use the `oadm policy reconcile-cluster-role-bindings` command to update
the role binding to remove extra subjects.
Info: clusterrolebinding/cluster-admins has extra subject {User admin }.
Info: clusterrolebinding/cluster-readers has more subjects than expected.
Use the `oadm policy reconcile-cluster-role-bindings` command to update
the role binding to remove extra subjects.
Info: clusterrolebinding/cluster-readers has extra subject {ServiceAccount
management-infra management-admin }.
Info: clusterrolebinding/cluster-readers has extra subject {ServiceAccount
logging aggregated-logging-fluentd }.
[Note] Running diagnostic: ClusterRoles
Description: Check that the default ClusterRoles are present and contain
the expected permissions
[Note] Running diagnostic: ClusterRouterName
Description: Check there is a working router
ERROR: [DClu2007 from diagnostic
ClusterRouter@openshift/origin/pkg/diagnostics/cluster/router.go:156]
The "router" DeploymentConfig exists but has no running pods, so it
is not available. Apps will not be externally accessible via the router.
[Note] Running diagnostic: MasterNode
Description: Check if master is also running node (for Open vSwitch)
Info: Found a node with same IP as master: paas.intrinsic.world
[Note] Running diagnostic: NodeDefinitions
Description: Check node records on master
WARN: [DClu0003 from diagnostic
NodeDefinition@openshift/origin/pkg/diagnostics/cluster/node_definitions.go:112]
Node paas.intrinsic.world is ready but is marked Unschedulable.
This is usually set manually for administrative reasons.
An administrator can mark the node schedulable with:
oadm manage-node paas.intrinsic.world --schedulable=true
While in this state, pods should not be scheduled to deploy on the node.
Existing pods will continue to run until completed or evacuated (see
other options for 'oadm manage-node').
[Note] Running diagnostic: AnalyzeLogs
Description: Check for recent problems in systemd service logs
Info: Checking journalctl logs for 'origin-master' service
Info: Checking journalctl logs for 'origin-node' service
Info: Checking journalctl logs for 'docker' service
[Note] Running diagnostic: MasterConfigCheck
Description: Check the master config file
Info: Found a master config file: /etc/origin/master/master-config.yaml
WARN: [DH0005 from diagnostic
MasterConfigCheck@openshift/origin/pkg/diagnostics/host/check_master_config.go:58]
Validation of master config file '/etc/origin/master/master-config.yaml'
warned:
assetConfig.loggingPublicURL: Invalid value: "": required to view
aggregated container logs in the console
assetConfig.metricsPublicURL: Invalid value: "": required to view
cluster metrics in the console
[Note] Running diagnostic: NodeConfigCheck
Description: Check the node config file
Info: Found a node config file: /etc/origin/node/node-config.yaml
[Note] Running diagnostic: UnitStatus
Description: Check status for related systemd units
[Note] Summary of diagnostics execution (version v1.1.6):
[Note] Warnings seen: 3
[Note] Errors seen: 4
> On 05 Sep 2016, at 18:46, Clayton Coleman <[email protected]> wrote:
>
> Did you change the IP of your master, or otherwise delete / alter the
> openshift-infra namespace? Or have your client certificates expired
> (is this cluster 1 year old(?
>
> Before deleting, try two things:
>
> oadm diagnostics
>
> From the master (to see if it identifies anything).
>
> Also check your certificate expiration a.
>
>> On Sep 5, 2016, at 5:00 AM, Candide Kemmler <[email protected]> wrote:
>>
>> Hi,
>>
>> I have a development server setup made up of two nodes (1 master - 1 slave)
>> running a bunch of different projects and environments which just crashed
>> badly on me.
>>
>> Symptoms are: all containers in all projects are in pending state (orange
>> circle) - when I try to `delete all`, things get removed but pods hang in a
>> 'terminating' state. oc describe gives me uninteresting information that I
>> already know (basically that pods are Pending) and oc logs tells me that it
>> (could not find the requested resource).
>>
>> I tried to `sudo systemctl restart origin-master` as it seems to have
>> produced good results in the past, but that didn't help this time. I also
>> tried that in combination with a full system reboot.
>>
>> Finally I tried running the ansible scripts in hopes of updating origin to
>> the latest version (it's still running 1.1.6) but I got the following error
>> log:
>>
>> failed: [paas.intrinsic.world] => {"changed": false, "cmd": ["oc", "create",
>> "-n", "openshift", "-f",
>> "/usr/share/openshift/examples/image-streams/image-streams-centos7.json"],
>> "delta": "0:00:00.180874", "end": "2016-09-05 07:20:12.050123", "failed":
>> true, "failed_when_result": true, "rc": 1, "start": "2016-09-05
>> 07:20:11.869249", "stdout_lines": [], "warnings": []}
>> stderr: unable to connect to a server to handle "imagestreamlists": the
>> server has asked for the client to provide credentials
>>
>> FATAL: all hosts have already failed -- aborting
>>
>> PLAY RECAP
>> ********************************************************************
>> to retry, use: --limit @/Users/candide/config.retry
>>
>> apps.intrinsic.world : ok=48 changed=0 unreachable=0 failed=0
>> localhost : ok=15 changed=0 unreachable=0 failed=0
>> paas.intrinsic.world : ok=207 changed=0 unreachable=0 failed=1
>>
>> My last option is to reinstall everything from scratch but before I do this
>> I wanted to know if you guys had other ideas on how to get on top of things
>> again.
>>
>> Candide
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users