openshift installation error on TASK openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created

Omer Faruk SEN Sun, 14 Oct 2018 11:23:05 -0700

Hello,

Is there anyone in this list had this issue on openshift 3.11 ?


My hosts file:

[masters]
master.os.serra.local

[etcd]
master.os.serra.local

[nodes]
master.os.serra.local openshift_node_group_name='node-config-master-infra'
node1.os.serra.local openshift_node_group_name='node-config-compute'

[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
ansible_user=root
openshift_deployment_type=openshift-enterprise
openshift_master_default_subdomain=apps.os.serra.local
debug_level=2
oreg_auth_user='110xxxx|user1'
oreg_auth_password='XXXXXXXXXXXXXXxxx'
openshift_check_min_host_memory_gb=4


I already registered on redhat for  oreg_auth_user and password. also both
system is RHEL 7.5 with latest updates.


deploy_cluster.yml  output:

TASK [openshift_cluster_monitoring_operator : Set
cluster-monitoring-operator template] ***
changed: [master.os.serra.local]

TASK [openshift_cluster_monitoring_operator : Set
cluster-monitoring-operator template] ***
changed: [master.os.serra.local]

TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor
CRD to be created] ***
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (30
retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (29
retries left).
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (28
retries left).
......
FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (1 retries
left).
fatal: [master.os.serra.local]: FAILED! => {"attempts": 30, "changed":
true, "cmd": ["oc", "get", "crd", "servicemonitors.monitoring.coreos.com",
"-n", "openshift-monitoring",
"--config=/tmp/openshift-cluster-monitoring-ansible-SswP6B/admin.kubeconfig"],
"delta": "0:00:00.274308", "end": "2018-10-14 19:20:36.769452", "msg":
"non-zero return code", "rc": 1, "start": "2018-10-14 19:20:36.495144",
"stderr": "No resources found.\nError from server (NotFound):
customresourcedefinitions.apiextensions.k8s.io \"
servicemonitors.monitoring.coreos.com\" not found", "stderr_lines": ["No
resources found.", "Error from server (NotFound):
customresourcedefinitions.apiextensions.k8s.io \"
servicemonitors.monitoring.coreos.com\" not found"], "stdout": "",
"stdout_lines": []}
        to retry, use: --limit
@/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry

PLAY RECAP
*********************************************************************
localhost                  : ok=11   changed=0    unreachable=0
failed=0
master.os.serra.local      : ok=589  changed=264  unreachable=0
failed=1
node1.os.serra.local       : ok=118  changed=61   unreachable=0
failed=0


INSTALLER STATUS
***************************************************************
Initialization               : Complete (0:00:37)
Health Check                 : Complete (0:01:08)
Node Bootstrap Preparation   : Complete (0:14:57)
etcd Install                 : Complete (0:02:50)
Master Install               : Complete (0:05:44)
Master Additional Install    : Complete (0:01:58)
Node Join                    : Complete (0:00:33)
Hosted Install               : Complete (0:01:02)
Cluster Monitoring Operator  : In Progress (0:15:31)
        This phase can be restarted by running:
playbooks/openshift-monitoring/config.yml


Failure summary:


  1. Hosts:    master.os.serra.local
     Play:     Configure Cluster Monitoring Operator
     Task:     Wait for the ServiceMonitor CRD to be created
     Message:  non-zero return code


At /var/log/messages I get /etc/cni/net.d/ is emtpy error like:

Oct 14 19:17:58 master atomic-openshift-node: exec openshift start network
--config=/etc/origin/node/node-config.yaml --kubeconfig=/tmp/kubeconfig
--loglevel=${DEBUG_LOGLEVEL:-2}
Oct 14 19:17:58 master atomic-openshift-node: ] Args:[] WorkingDir:
Ports:[{Name:healthz HostPort:10256 ContainerPort:10256 Protocol:TCP
HostIP:}] EnvFrom:[] Env:[{Name:OPENSHIFT_DNS_DOMAIN Value:cluster.local
ValueFrom:nil}] Resources:{Limits:map[] Requests:map[cpu:{i:{value:100
scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:209715200
scale:0} d:{Dec:<nil>} s: Format:BinarySI}]}
VolumeMounts:[{Name:host-config ReadOnly:true MountPath:/etc/origin/node/
SubPath: MountPropagation:<nil>} {Name:host-sysconfig-node ReadOnly:true
MountPath:/etc/sysconfig/origin-node SubPath: MountPropagation:<nil>}
{Name:host-var-run ReadOnly:false MountPath:/var/run SubPath:
MountPropagation:<nil>} {Name:host-var-run-dbus ReadOnly:true
MountPath:/var/run/dbus/ SubPath: MountPropagation:<nil>}
{Name:host-var-run-ovs ReadOnly:true MountPath:/var/run/openvswitch/
SubPath: MountPropagation:<nil>} {Name:host-var-run-kubernetes
ReadOnly:true MountPath:/var/run/kubernetes/ SubPath:
MountPropagation:<nil>} {Name:host-var-run-openshift-sdn ReadOnly:false
MountPath:/var/run/openshift-sdn SubPath: MountPropagation:<nil>}
{Name:host-opt-cni-bin ReadOnly:false MountPath:/host/opt/cni/bin SubPath:
MountPropagation:<nil>} {Name:host-etc-cni-netd ReadOnly:false
MountPath:/etc/cni/net.d SubPath: MountPropagation:<nil>}
{Name:host-var-lib-cni-networks-openshift-sdn ReadOnly:false
MountPath:/var/lib/cni/networks/openshift-sdn SubPath:
MountPropagation:<nil>} {Name:sdn-token-8f5tb ReadOnly:true
MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:
MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:nil
ReadinessProbe:nil Lifecycle:nil
TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File
ImagePullPolicy:IfNotPresent
SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:*0,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,}
Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that
we should restart it.
Oct 14 19:17:58 master atomic-openshift-node: I1014 19:17:58.931158   17873
kuberuntime_manager.go:757] checking backoff for container "sdn" in pod
"sdn-7fcwv_openshift-sdn(2b9a2a99-cfdb-11e8-85c5-525400975a6b)"
Oct 14 19:17:58 master atomic-openshift-node: I1014 19:17:58.931335   17873
kuberuntime_manager.go:767] Back-off 5m0s restarting failed container=sdn
pod=sdn-7fcwv_openshift-sdn(2b9a2a99-cfdb-11e8-85c5-525400975a6b)
Oct 14 19:17:58 master atomic-openshift-node: E1014 19:17:58.931374   17873
pod_workers.go:186] Error syncing pod 2b9a2a99-cfdb-11e8-85c5-525400975a6b
("sdn-7fcwv_openshift-sdn(2b9a2a99-cfdb-11e8-85c5-525400975a6b)"),
skipping: failed to "StartContainer" for "sdn" with CrashLoopBackOff:
"Back-off 5m0s restarting failed container=sdn
pod=sdn-7fcwv_openshift-sdn(2b9a2a99-cfdb-11e8-85c5-525400975a6b)"
Oct 14 19:18:00 master atomic-openshift-node: W1014 19:18:00.354523   17873
cni.go:172] Unable to update cni config: No networks found in /etc/cni/net.d
Oct 14 19:18:00 master atomic-openshift-node: E1014 19:18:00.354630   17873
kubelet.go:2101] Container runtime network not ready: NetworkReady=false
reason:NetworkPluginNotReady message:docker: network plugin is not ready:
cni config uninitialized
Oct 14 19:18:01 master atomic-openshift-node: E1014 19:18:01.828464   17873
summary.go:102] Failed to get system container stats for
"/system.slice/atomic-openshift-node.service": failed to get cgroup stats
for "/system.slice/atomic-openshift-node.service": failed to get container
info for "/system.slice/atomic-openshift-node.service": unknown container
"/system.slice/atomic-openshift-node.service"
Oct 14 19:18:03 master python: ansible-command Invoked with warn=True exec

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

openshift installation error on TASK openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created

Reply via email to