Re: origin crashed

Clayton Coleman Tue, 06 Sep 2016 09:05:07 -0700

What auth mechanism backs your "admin" user?

On Sep 6, 2016, at 10:19 AM, Candide Kemmler <[email protected]>
wrote:


Yes that seems to be OK..., although I'm not sure I know exactly what the
"root cluster cert is", so I checked all the following:

[root@paas master]# openssl x509 -enddate -noout -in cloudapps.router.pem
notAfter=Apr 21 16:38:31 2018 GMT
[root@paas master]# openssl x509 -enddate -noout -in ca.crt
notAfter=Apr 20 16:31:56 2021 GMT
[root@paas master]# openssl x509 -enddate -noout -in master.server.crt
notAfter=Apr 21 16:32:00 2018 GMT
[root@paas master]# openssl x509 -enddate -noout -in etcd.server.crt
notAfter=Apr 21 16:32:01 2018 GMT
[root@paas master]# openssl x509 -enddate -noout -in admin.crt
notAfter=Apr 21 16:31:58 2018 GMT
[root@paas master]# openssl x509 -enddate -noout -in ca-bundle.crt
notAfter=Apr 20 16:31:56 2021 GMT
[root@paas master]# openssl x509 -enddate -noout -in openshift-master.crt
notAfter=Apr 21 16:31:57 2018 GMT
[root@paas master]# openssl x509 -enddate -noout -in openshift-registry.crt
notAfter=Apr 21 16:32:00 2018 GMT

On 06 Sep 2016, at 15:04, Clayton Coleman <[email protected]> wrote:

Were you able to check the expiration date on your admin root cluster cert
and verify it has not expired?

On Sep 6, 2016, at 5:19 AM, Candide Kemmler <[email protected]> wrote:

Hi Clayton,

Thanks! Here's the result of running `sudo oadm diagnostics`. I'm
particularly bothered by the "the server has asked for the client to
provide credentials" message as I'm seeing this one when I try to execute
the ansible scripts as well. Do you know how to solve it?

Any other ideas on things I should focus on?

Regards,

Candide


[Note] Determining if client configuration exists for client/cluster
diagnostics
Info:  Successfully read a client config file at '/root/.kube/config'
[Note] Could not configure a client, so client diagnostics are limited to
testing configuration and connection
Info:  Using context for cluster-admin access:
'default/paas-intrinsic-world:8443/system:admin'
[Note] Performing systemd discovery

[Note] Running diagnostic:
ConfigContexts[logging/paas-intrinsic-world:8443/admin]
       Description: Validate client config context is complete and has
connectivity

*ERROR: [DCli0014 from diagnostic
ConfigContexts@openshift/origin/pkg/diagnostics/client/config_contexts.go:285]*
*       For client config context
'logging/paas-intrinsic-world:8443/admin':*
*       The server URL is 'https://paas.intrinsic.world:8443'
<https://paas.intrinsic.world:8443'>*
*       The user authentication is 'admin/paas-intrinsic-world:8443'*
*       The current project is 'logging'*
*       (*errors.StatusError) the server has asked for the client to
provide credentials*

       This means that when we tried to make a request to the master API
       server, the request required credentials that were not presented.
This
       can happen with an expired or invalid authentication token. Try
logging
       in with this user again.

[Note] Running diagnostic:
ConfigContexts[logging/paas-intrinsic-world:8443/system:admin]
       Description: Validate client config context is complete and has
connectivity

Info:  For client config context
'logging/paas-intrinsic-world:8443/system:admin':
       The server URL is 'https://paas.intrinsic.world:8443'
       The user authentication is 'system:admin/paas-intrinsic-world:8443'
       The current project is 'logging'
       Successfully requested project list; has access to project(s):
         [openshift-infra dev ieml-demo logging management-infra misc
openshift p2p default ieml-dev ...]

[Note] Running diagnostic: ClusterRegistry
       Description: Check that there is a working Docker registry

*WARN:  [DClu1009 from diagnostic
ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:217]*
*       The "docker-registry-1-8w93s" pod for the "docker-registry" service
is not running.*
*       This may be transient, a scheduling error, or something else.*

*ERROR: [DClu1001 from diagnostic
ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:173]*
*       The "docker-registry" service exists but no pods currently running,
so it*
*       is not available. Builds and deployments that use the registry will
fail.*

[Note] Running diagnostic: ClusterRoleBindings
       Description: Check that the default ClusterRoleBindings are present
and contain the expected subjects

Info:  clusterrolebinding/cluster-admins has more subjects than expected.

       Use the `oadm policy reconcile-cluster-role-bindings` command to
update the role binding to remove extra subjects.

Info:  clusterrolebinding/cluster-admins has extra subject {User  admin
 }.

Info:  clusterrolebinding/cluster-readers has more subjects than expected.

       Use the `oadm policy reconcile-cluster-role-bindings` command to
update the role binding to remove extra subjects.

Info:  clusterrolebinding/cluster-readers has extra subject {ServiceAccount
management-infra management-admin    }.
Info:  clusterrolebinding/cluster-readers has extra subject {ServiceAccount
logging aggregated-logging-fluentd    }.

[Note] Running diagnostic: ClusterRoles
       Description: Check that the default ClusterRoles are present and
contain the expected permissions

[Note] Running diagnostic: ClusterRouterName
       Description: Check there is a working router

*ERROR: [DClu2007 from diagnostic
ClusterRouter@openshift/origin/pkg/diagnostics/cluster/router.go:156]*
*       The "router" DeploymentConfig exists but has no running pods, so it*
*       is not available. Apps will not be externally accessible via the
router.*

[Note] Running diagnostic: MasterNode
       Description: Check if master is also running node (for Open vSwitch)

Info:  Found a node with same IP as master: paas.intrinsic.world

[Note] Running diagnostic: NodeDefinitions
       Description: Check node records on master

*WARN:  [DClu0003 from diagnostic
NodeDefinition@openshift/origin/pkg/diagnostics/cluster/node_definitions.go:112]*
*       Node paas.intrinsic.world is ready but is marked Unschedulable.*
*       This is usually set manually for administrative reasons.*
*       An administrator can mark the node schedulable with:*
*           oadm manage-node paas.intrinsic.world --schedulable=true*

       While in this state, pods should not be scheduled to deploy on the
node.
       Existing pods will continue to run until completed or evacuated (see
       other options for 'oadm manage-node').

[Note] Running diagnostic: AnalyzeLogs
       Description: Check for recent problems in systemd service logs

Info:  Checking journalctl logs for 'origin-master' service
Info:  Checking journalctl logs for 'origin-node' service
Info:  Checking journalctl logs for 'docker' service

[Note] Running diagnostic: MasterConfigCheck
       Description: Check the master config file

Info:  Found a master config file: /etc/origin/master/master-config.yaml

*WARN:  [DH0005 from diagnostic
MasterConfigCheck@openshift/origin/pkg/diagnostics/host/check_master_config.go:58]*
*       Validation of master config file
'/etc/origin/master/master-config.yaml' warned:*
*       assetConfig.loggingPublicURL: Invalid value: "": required to view
aggregated container logs in the console*
*       assetConfig.metricsPublicURL: Invalid value: "": required to view
cluster metrics in the console*

[Note] Running diagnostic: NodeConfigCheck
       Description: Check the node config file

Info:  Found a node config file: /etc/origin/node/node-config.yaml

[Note] Running diagnostic: UnitStatus
       Description: Check status for related systemd units

[Note] Summary of diagnostics execution (version v1.1.6):
[Note] Warnings seen: 3
[Note] Errors seen: 4



On 05 Sep 2016, at 18:46, Clayton Coleman <[email protected]> wrote:

Did you change the IP of your master, or otherwise delete / alter the
openshift-infra namespace?  Or have your client certificates expired
(is this cluster 1 year old(?

Before deleting, try two things:

   oadm diagnostics

>From the master (to see if it identifies anything).

Also check your certificate expiration a.

On Sep 5, 2016, at 5:00 AM, Candide Kemmler <[email protected]> wrote:

Hi,

I have a development server setup made up of two nodes (1 master - 1 slave)
running a bunch of different projects and environments which just crashed
badly on me.

Symptoms are: all containers in all projects are in pending state (orange
circle) - when I try to `delete all`, things get removed but pods hang in a
'terminating' state. oc describe gives me uninteresting information that I
already know (basically that pods are Pending) and oc logs tells me that it
(could not find the requested resource).

I tried to `sudo systemctl restart origin-master` as it seems to have
produced good results in the past, but that didn't help this time. I also
tried that in combination with a full system reboot.

Finally I tried running the ansible scripts in hopes of updating origin to
the latest version (it's still running 1.1.6) but I got the following error
log:

failed: [paas.intrinsic.world] => {"changed": false, "cmd": ["oc",
"create", "-n", "openshift", "-f",
"/usr/share/openshift/examples/image-streams/image-streams-centos7.json"],
"delta": "0:00:00.180874", "end": "2016-09-05 07:20:12.050123", "failed":
true, "failed_when_result": true, "rc": 1, "start": "2016-09-05
07:20:11.869249", "stdout_lines": [], "warnings": []}
stderr: unable to connect to a server to handle "imagestreamlists": the
server has asked for the client to provide credentials

FATAL: all hosts have already failed -- aborting

PLAY RECAP
********************************************************************
         to retry, use: --limit @/Users/candide/config.retry

apps.intrinsic.world       : ok=48   changed=0    unreachable=0    failed=0
localhost                  : ok=15   changed=0    unreachable=0    failed=0
paas.intrinsic.world       : ok=207  changed=0    unreachable=0    failed=1

My last option is to reinstall everything from scratch but before I do this
I wanted to know if you guys had other ideas on how to get on top of things
again.

Candide

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: origin crashed

Reply via email to